In the first part of this series we
went through the basics of setting up the registrar, the hosting and get a VPS
online and reachable. In this second part we will configure the firewall and
the reverse proxy.
Firewall, or keep that port shut
I had the misfortune to encounter a professor during Uni that didn’t understand
much physics, but at least had a sense of humour. He used to keep the network
cable unplugged from his PC except for when he was actively browsing or
checking his email; he used to call this practice “the antivirus”.
As impractical as it may be, it is tautologically true that a computer isolated
from the network cannot be subject to an attack via the network. It is also
true that in today’s usage patterns, with local applications exchanged for
cloud services, it would render one’s computer borderline useless. This is
especially true for a server purposely set up to be attached to the network
and serving our website.
The next best thing to pulling the cable is then to limit the connections our
machine is willing to accept. Connections go through ports, and we can keep
ports shut by means of a firewall. The easiest firewall I know of is ufw (for
“uncomplicated firewall”) and it allows setting up rules by service name
instead of by port. By that I mean that one doesn’t need to remember that ssh
listens by default on port 22 and thus write a rule for that specific port: the
service name is all that is needed in the majority of cases. Configuring ufw
consists in a series of commands that allow or deny a particular service.
Issuing
ufw allow ssh
ufw allow http
ufw allow https
will generate the most minimal configuration we can aim at. The server
will only be reachable by ssh (port 22), http (port 80) and https (port
443); connections through any other ports will be dropped.
Once we are happy with our configuration it is just a matter of invoking
ufw enable
and we are done.
Reverse proxy
It is now time to expose our first service. Since in general a single machine
will host multiple services, and to ease containerisation, we use a so-called
reverse proxy. A reverse proxy takes care of the final leg of the addressing of
incoming requests towards the correct service responsible for providing a
response.
In the diagram below, internet magic makes sure that the request reaches our
server. The server forwards the request to the reverse proxy service and the
latter forwards the request to the correct service for further processing.
graph TD
r0(["Request $addr:$port/foo/this"])
srv["server"]
rp("reverse proxy")
s0("service /foo")
s1("service /bar")
r0 --> srv
srv --> rp
rp --> s0
rp --- s1
A reverse proxy that plays well with containers is
Traefik. As a short digression, in theory Traefik
should also be able to work as a Kubernetes Ingress, but in my experience it
was quite a nightmare and I quickly reverted to a more standard
NginxIngress+CertBot)…but for a small hobby server with just a few plain
containers it gets the job done decently well.
Traefik has got two nice features: it is able to manage SSL certificates, and
it automatically detects containers that should be exposed.
We are going to configure Traefik by means of docker compose. We first
configure the container:
# file: traefik/docker-compose.ymlversion: "3.2"networks:
public:
driver: bridgeservices:
app:
image: traefik:latest # Yeah, I know I shouldn't...restart: alwaysports:
- 80:80 - 443:443volumes:
- ./traefik.yml:/etc/traefik/traefik.yml:ro - /var/run/docker.sock:/var/run/docker.sock:ro - ./traefik/acme:/etc/traefik/acmeenvironment:
- # YOUR API KEY HEREnetworks:
- publiclabels:
# These labels take care of wildcard certificate for the whole $DOMAIN - "traefik.docker.network=traefik_public" - "traefik.http.routers.api.rule=Host(`$API_DOMAIN`)" - "traefik.http.routers.api.entrypoints=websecure" - "traefik.http.routers.api.tls=true" - "traefik.http.routers.api.tls.domains[0].main=$DOMAIN" - "traefik.http.routers.api.tls.domains[0].sans=*.$DOMAIN" - "traefik.http.routers.api.tls.certresolver=letsencrypt"# OPTIONAL:# You can expose a web interface to Traefik, useful for debugging.# If you do activate it, do protect it at least with a simple password# authentication!# basic auth - "traefik.http.routers.api.middlewares=auth" - "traefik.http.middlewares.auth.basicauth.users=$USER:$HASHED_PWD"# service - "traefik.http.routers.api.service=api@internal"# load balancer - "traefik.http.services.dummy.loadbalancer.server.port=9999"
Notice that we are defining a public network. As the name suggests, this is
the network that will be used to communicate with the rest of the world.
We will see that we need to reference it in the services that we need to
expose. Notice also that the Traefik service binds to ports 80 and 443 of the
host. This is the first forward from server to reverse proxy in the diagram
above.
Next we need to configure Traefik itself. For convenience we write the
configuration in a yaml file and we mount the file as read only onto the
container. The content of the configuration file is the following:
From the top down: we instruct Traefik to watch docker for new containers
that may want to use it to expose their services, we configure the entrypoints
and we configure SSL certificates. Notice that watching docker and storing
certificates require access to the host filesystem for different reason and
with different privileges (cf. volumes configuration in the
docker-compose.yml).
The entry points map the protocols (http and https) to a specific public
port (80 and 443). Since we are at it, we also redirect any incoming http
request to the relative https one.
The certificate resolvers section is required in order for Traefik to take
care of obtaining SSL certificates from certificate authorities APIs and
renewing them when they are about to expire. In order to prove that one owns
the server for which the certificate is requested, certificate autorithies set
up a challenge. There are different kind of challenges, as outlined in the
documentation. In the example
above we set up the DNS challenge, probably the strongest of all, as it allows
obtaining a blanket certificate that will cover any third (or deeper) level
domains. As the name suggests, this is done through the DNS, hence through the
registrar’s API. Not all registrars allow this, and one should also check that
the registrar is among the ones supported by Traefik.
Upon success, the certificate and auxiliary data get stored in the container’s
filesystem. It is extremely wise to make sure that that data is persisted to
the host filesystem as well, by means of a mounted volume, as I learnt the hard
way. In fact, most if not all certificate authorities are rate limited and if
you do not save the certificate and for some reason the Traefik container fails
and enters an infinite restart loop, you quickly go beyond the
limit and get (rightfully) banned for a few days.
Anyway, now the configuration is complete. We can activate Traefik with a
standard
docker compose up -d
We can now create a test service to check that everything is working correctly.
We will create a new folder and put the following
compose file inside:
Notice how we are recalling the public network we created. You can use docker network ls to identify the full name assigned to the public network we created
when deploying Traefik. The first label of the whoami service specifies that
we want the service to use the public network. The second one creates the http
router whoami and assigns it the third level domain whoami.$DOMAIN (notice
the backtics!). If everything went fine, deploying this compose file will
expose the service on that third level domain. This is possible because of the
DNS challenge from the Traefik configuration we’ve seen above. Incoming
requests are automatically redirected to the websecure endpoint, so by
visiting http://whoami.$DOMAIN we are going to receive a 301 Moved Permanently that point to https://whoami.$DOMAIN, just as expected.
Conclusions
This terminates the second part of the series on self-hosting a website. In the
future we are going to set up Hugo.