Automating Hugo with Git Hooks

TL;DR: Just adapt the script below to your needs. It just assumes that the deployment directory is where the web server points to.

I host this blog on a Virtual Private Server (VPS)1 and keep it all version controlled under a git repository hosted in one of the major repository cloud services. Up to a few days ago, this meant that my (suboptimal) workflow to publish a post entailed:

  1. git push from my local machine towards the cloud remote
  2. SSH to my VPS
  3. git pull from the cloud remote to the “deployed” working tree
  4. Invoking hugo to recompile the blog

The frequency of update of this blog somewhat mitigated the clunkiness of this workflow, but there was a longtime desire to try and use git hooks to automate it.

Streamlining and automating the deployment

I removed the cloud based service as git is already distributed (some philosophical considerations on this point below), and as such every repo is almost2 as good as any other to be a remote. My local repository now directly uses as a remote a bare repository on the same VPS as the deployment.

However the content is served from a separate, clean, working tree that is automatically fed by the post-receive hook below:

#!/usr/bin/env bash

PUB_DIR=
GIT_DIR=
DEPLOY_USER=

while read oldrev newrev refname
do
    if [[ $refname = 'refs/heads/main' ]]; then
        echo "Deploying"
        # checkout the HEAD in a clean working tree
        sudo -u ${DEPLOY_USER?} git \
            --work-tree=${PUB_DIR?} checkout -f
        
        cd ${PUB_DIR?}
        # checkout submodules and recompile
        sudo -u ${DEPLOY_USER?} git \
            --git-dir=${GIT_DIR?} \
            --work-tree=. \
            submodule update --init
        sudo -u ${DEPLOY_USER?} hugo
        echo "Hugo returns: $?"
    fi
done

Since Hugo generates a static site, all that is left to do is to make sure that the web server point to the public/ directory within that tree. This way, at each push to the main branch the static site is automatically synched.

A few more details

git allows triggering the execution of arbitrary scripts whenever its internals run particular operations, both on the client and on the server side.

In the case at hand, I implement the post-receive hook which, as the name suggests, is a server side hook that runs on the other hand of a git push, when the remote has verified the payload and accepted it3.

git passes to the standard input of the script three values: the hash of the old remote HEAD, the hash of the new HEAD and the branch which is being updated. The read invocation in the script takes care of reading the input and storing these three values in three separate variables. This allows one to, e.g., only run the logic when a specific branch is being updated, in this case main.

In my setup, the user that I SSH to the VPS with is different from the one that is responsible for the deployment, so I prepend a sudo -u to each command, so that it is executed as the right user and that artifact be created with the correct ownership and permissions. The git-checkout --work-tree invocation creates a working tree from the bare repository at a custom location. A git-submodule invocation is then necessary to load the theme used by my Hugo installation. Note that, since the working tree is outside a repository, git-submodule is not able to independently determine the location of the repository and of the tree, which thus need to be passed explicitly.

Distributed, decentralised, federated

There are already lots of opinions on what adjective to add to “version control system” to characterise git. Many revolve around the fact that git was explicitly conceived with the workflow of the Linux Kernel development in mind, in which several peers develop independently and maintainers are free to merge branches from one or the other to put together the features required by a particular business case. In this sense I would say that git is distributed: if we call repository the union of all branches of all clones of a common project, including the private branches of individual contributors, then indeed this information is distributed (or sharded, to borrow from database lingo) across several independent local repositories.

In practice this workflow is rarely implemented, mostly because of the de-facto almost duopoly of cloud hosting for repositories. Most projects have a single source of truth in a specific branch of a specific cloud hosted remote. In this sense git is seldom used in a decentralised fashion; on the contrary, is used as a very centralised system, although one lacking lock files. This is even more true because we came to the point in which lots of metadata, and even data, of our repositories lives outside of the git repository itself: bug trackers, code reviews, wikis, and everything we came to assume as granted in every software project is stored only in the central source of truth.

More importantly, that information is not even distributed! This is where the concept of federation comes to play. While it is true that one could theoretically have a network of remotes that feed on each other, and some projects4 even attempt at decentralising and distributing at least some of the ancillary information I listed above, one fundamental aspect would still be left out: discovery.

Discovery, also called findability in the context of FAIR data, requires that a query executed in one node of the network be replicated throughout the network. Let us assume that we find a technical solution to decentralising and distributing repositories’ varia; how do we make sure that each node of the network remains aware of, e.g., all the issues open on any clone of that project? This is what federation is really about, and why decentralisation and distribution are not enough. I’m very hopeful for the efforts of several projects (notably Forgejo’s); I hope that in the near future it will be possible to fork a repository that lives in a different installation of potentially a different repository management software and feed back changesets, issues, etc., as seamlessly as we can now search for a Mastodon user, answer a toot, etc.

By the way, if you would like to comment this post, please do so by replying to this toot5.

Acknowledgements

I thank Zassenhaus for discussions and for reading a draft of this post.


  1. I extendedly wrote about part of my setup here and here↩︎

  2. Although any repository can serve as a valid remote, that doesn’t mean that it is a good idea. In particular, server repositories should be bare↩︎

  3. Two more hooks exist on the server side, cf. Git book↩︎

  4. git-issue and git-bug are two notable projects whose goal is to decentralise and distribute the issue tracker. ↩︎

  5. This is incidentally the next automation step: publish on your own site, syndicate everywhere. Right now I manually post on Mastodon and then edit the blog. ↩︎