Self-hosting a website in 2023, pt.2

In the first part of this series we went through the basics of setting up the registrar, the hosting and get a VPS online and reachable. In this second part we will configure the firewall and the reverse proxy.

Firewall, or keep that port shut

I had the misfortune to encounter a professor during Uni that didn’t understand much physics, but at least had a sense of humour. He used to keep the network cable unplugged from his PC except for when he was actively browsing or checking his email; he used to call this practice “the antivirus”.

As impractical as it may be, it is tautologically true that a computer isolated from the network cannot be subject to an attack via the network. It is also true that in today’s usage patterns, with local applications exchanged for cloud services, it would render one’s computer borderline useless. This is especially true for a server purposely set up to be attached to the network and serving our website.

The next best thing to pulling the cable is then to limit the connections our machine is willing to accept. Connections go through ports, and we can keep ports shut by means of a firewall. The easiest firewall I know of is ufw (for “uncomplicated firewall”) and it allows setting up rules by service name instead of by port. By that I mean that one doesn’t need to remember that ssh listens by default on port 22 and thus write a rule for that specific port: the service name is all that is needed in the majority of cases. Configuring ufw consists in a series of commands that allow or deny a particular service.

Issuing

ufw allow ssh
ufw allow http
ufw allow https

will generate the most minimal configuration we can aim at. The server will only be reachable by ssh (port 22), http (port 80) and https (port 443); connections through any other ports will be dropped.

Once we are happy with our configuration it is just a matter of invoking

ufw enable

and we are done.

Reverse proxy

It is now time to expose our first service. Since in general a single machine will host multiple services, and to ease containerisation, we use a so-called reverse proxy. A reverse proxy takes care of the final leg of the addressing of incoming requests towards the correct service responsible for providing a response.

In the diagram below, internet magic makes sure that the request reaches our server. The server forwards the request to the reverse proxy service and the latter forwards the request to the correct service for further processing.

graph TD r0(["Request $addr:$port/foo/this"]) srv["server"] rp("reverse proxy") s0("service /foo") s1("service /bar") r0 --> srv srv --> rp rp --> s0 rp --- s1

A reverse proxy that plays well with containers is Traefik. As a short digression, in theory Traefik should also be able to work as a Kubernetes Ingress, but in my experience it was quite a nightmare and I quickly reverted to a more standard NginxIngress+CertBot)…but for a small hobby server with just a few plain containers it gets the job done decently well.

Traefik has got two nice features: it is able to manage SSL certificates, and it automatically detects containers that should be exposed.

We are going to configure Traefik by means of docker compose. We first configure the container:

# file: traefik/docker-compose.yml
version: "3.2"

networks:
  public:
    driver: bridge

services:
  app:
    image: traefik:latest  # Yeah, I know I shouldn't...
    restart: always
    ports:
    - 80:80
    - 443:443
    volumes:
    - ./traefik.yml:/etc/traefik/traefik.yml:ro
    - /var/run/docker.sock:/var/run/docker.sock:ro
    - ./traefik/acme:/etc/traefik/acme
    environment:
    - # YOUR API KEY HERE
    networks:
    - public
    labels:
    # These labels take care of wildcard certificate for the whole $DOMAIN
    - "traefik.docker.network=traefik_public"
    - "traefik.http.routers.api.rule=Host(`$API_DOMAIN`)"
    - "traefik.http.routers.api.entrypoints=websecure"
    - "traefik.http.routers.api.tls=true"
    - "traefik.http.routers.api.tls.domains[0].main=$DOMAIN"
    - "traefik.http.routers.api.tls.domains[0].sans=*.$DOMAIN"
    - "traefik.http.routers.api.tls.certresolver=letsencrypt"
    # OPTIONAL:
    #   You can expose a web interface to Traefik, useful for debugging.
    #   If you do activate it, do protect it at least with a simple password
    #   authentication!
    # basic auth
    - "traefik.http.routers.api.middlewares=auth"
    - "traefik.http.middlewares.auth.basicauth.users=$USER:$HASHED_PWD"
    # service
    - "traefik.http.routers.api.service=api@internal"
    # load balancer
    - "traefik.http.services.dummy.loadbalancer.server.port=9999"

Notice that we are defining a public network. As the name suggests, this is the network that will be used to communicate with the rest of the world. We will see that we need to reference it in the services that we need to expose. Notice also that the Traefik service binds to ports 80 and 443 of the host. This is the first forward from server to reverse proxy in the diagram above.

Next we need to configure Traefik itself. For convenience we write the configuration in a yaml file and we mount the file as read only onto the container. The content of the configuration file is the following:

# file: traefik/traefik.yml
global:
  checkNewVersion: true

accessLog: {}

providers:
  docker:
    endpoint: "unix://var/run/docker.sock"
    watch: true

entryPoints:
  web:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: websecure
          scheme: https
  websecure:
    address: ":443"
      http:
        tls:
          certResolver: "letsencrypt"

# Let's Encrypt
certificatesResolvers:
  letsencrypt:
    acme:
      email: "$EMAIL_FOR_CERTIFICATE"
      storage: "/etc/traefik/acme/acme.json"
      dnsChallenge:
        provider: #SUPPORTED PROVIDER
        delayBeforeCheck: 0

From the top down: we instruct Traefik to watch docker for new containers that may want to use it to expose their services, we configure the entrypoints and we configure SSL certificates. Notice that watching docker and storing certificates require access to the host filesystem for different reason and with different privileges (cf. volumes configuration in the docker-compose.yml).

The entry points map the protocols (http and https) to a specific public port (80 and 443). Since we are at it, we also redirect any incoming http request to the relative https one.

The certificate resolvers section is required in order for Traefik to take care of obtaining SSL certificates from certificate authorities APIs and renewing them when they are about to expire. In order to prove that one owns the server for which the certificate is requested, certificate autorithies set up a challenge. There are different kind of challenges, as outlined in the documentation. In the example above we set up the DNS challenge, probably the strongest of all, as it allows obtaining a blanket certificate that will cover any third (or deeper) level domains. As the name suggests, this is done through the DNS, hence through the registrar’s API. Not all registrars allow this, and one should also check that the registrar is among the ones supported by Traefik.

Upon success, the certificate and auxiliary data get stored in the container’s filesystem. It is extremely wise to make sure that that data is persisted to the host filesystem as well, by means of a mounted volume, as I learnt the hard way. In fact, most if not all certificate authorities are rate limited and if you do not save the certificate and for some reason the Traefik container fails and enters an infinite restart loop, you quickly go beyond the limit and get (rightfully) banned for a few days.

Anyway, now the configuration is complete. We can activate Traefik with a standard

docker compose up -d

We can now create a test service to check that everything is working correctly. We will create a new folder and put the following compose file inside:

# file: docker-compose.yml
version: '3'

networks:
  traefik_public:
    name: "traefik_public"
    external: true

services:
  whoami:
    networks:
      - traefik_public
    image: traefik/whoami
    labels:
      - "traefik.docker.network=traefik_public"
      - "traefik.http.routers.whoami.rule=Host(`whoami.$DOMAIN`)"

Notice how we are recalling the public network we created. You can use docker network ls to identify the full name assigned to the public network we created when deploying Traefik. The first label of the whoami service specifies that we want the service to use the public network. The second one creates the http router whoami and assigns it the third level domain whoami.$DOMAIN (notice the backtics!). If everything went fine, deploying this compose file will expose the service on that third level domain. This is possible because of the DNS challenge from the Traefik configuration we’ve seen above. Incoming requests are automatically redirected to the websecure endpoint, so by visiting http://whoami.$DOMAIN we are going to receive a 301 Moved Permanently that point to https://whoami.$DOMAIN, just as expected.

Conclusions

This terminates the second part of the series on self-hosting a website. In the future we are going to set up Hugo.