TL;DR: Just adapt the script below to your needs. It just assumes
that the deployment directory is where the web server points to.
I host this blog on a Virtual Private Server (VPS)1 and keep it
all version controlled under a git repository
hosted in one of the major repository cloud services. Up to a few days
ago, this meant that my (suboptimal) workflow to publish a post entailed:
git push from my local machine towards the cloud remote
SSH to my VPS
git pull from the cloud remote to the “deployed” working tree
Invoking hugo to recompile the blog
The frequency of update of this blog somewhat mitigated the
clunkiness of this workflow, but there was a longtime desire to try
and use git hooks to automate it.
Streamlining and automating the deployment
I removed the cloud based service as git is already
distributed
(some philosophical considerations on this point below), and as such
every repo is almost2 as good as any other to be a remote.
My local repository now directly uses as a remote a bare repository on
the same VPS as the deployment.
However the content is served from a separate, clean, working tree that
is automatically fed by the post-receive hook below:
#!/usr/bin/env bash
PUB_DIR=GIT_DIR=DEPLOY_USER=while read oldrev newrev refname
doif[[ $refname ='refs/heads/main']]; then echo "Deploying"# checkout the HEAD in a clean working tree sudo -u ${DEPLOY_USER?} git \
--work-tree=${PUB_DIR?} checkout -f
cd ${PUB_DIR?}# checkout submodules and recompile sudo -u ${DEPLOY_USER?} git \
--git-dir=${GIT_DIR?}\
--work-tree=. \
submodule update --init
sudo -u ${DEPLOY_USER?} hugo
echo "Hugo returns: $?"fidone
Since Hugo generates a static site, all that is left to do is to make
sure that the web server point to the public/ directory within that
tree. This way, at each push to the main branch the static site is automatically synched.
A few more details
gitallows
triggering
the execution of arbitrary scripts whenever its internals run particular
operations, both on the client and on the server side.
In the case at hand, I implement the post-receive hook which, as the
name suggests, is a server side hook that runs on the other hand of a
git push, when the remote has verified the payload and accepted
it3.
git passes to the standard input of the script three values: the hash
of the old remote HEAD, the hash of the new HEAD and the branch
which is being updated. The read invocation in the script takes care
of reading the input and storing these three values in three separate
variables. This allows one to, e.g., only run the logic when a specific
branch is being updated, in this case main.
In my setup, the user that I SSH to the VPS with is different from the
one that is responsible for the deployment, so I prepend a sudo -u to
each command, so that it is executed as the right user and that artifact
be created with the correct ownership and permissions. The git-checkout --work-tree invocation creates a working tree from the bare repository
at a custom location. A git-submodule invocation is then necessary to
load the theme used by my Hugo installation. Note that, since the
working tree is outside a repository, git-submodule is not able to
independently determine the location of the repository and of the tree,
which thus need to be passed explicitly.
Distributed, decentralised, federated
There are alreadylotsofopinions
on what adjective to add to “version control system” to characterise
git. Many revolve around the fact that git was explicitly conceived
with the workflow of the Linux Kernel development in mind, in which
several peers develop independently and maintainers are free to merge
branches from one or the other to put together the features required by
a particular business case. In this sense I would say that git is
distributed: if we call repository the union of all branches of all
clones of a common project, including the private branches of individual
contributors, then indeed this information is distributed (or sharded, to
borrow from database lingo) across several independent local
repositories.
In practice this workflow is rarely implemented, mostly because of the
de-facto almost duopoly of cloud hosting for repositories. Most projects
have a single source of truth in a specific branch of a specific cloud
hosted remote. In this sense git is seldom used in a decentralised
fashion; on the contrary, is used as a very centralised system, although
one lacking lock files. This is even more true because we came to the
point in which lots of metadata, and even data, of our repositories
lives outside of the git repository itself: bug trackers, code
reviews, wikis, and everything we came to assume as granted in every
software project is stored only in the central source of truth.
More importantly, that information is not even distributed! This is
where the concept of federation comes to play. While it is true that one
could theoretically have a network of remotes that feed on each other,
and some projects4 even attempt at decentralising and
distributing at least some of the ancillary information I listed above,
one fundamental aspect would still be left out: discovery.
Discovery, also called findability in the context of FAIR
data, requires that a query
executed in one node of the network be replicated throughout the
network. Let us assume that we find a technical solution to
decentralising and distributing repositories’ varia; how do we make sure
that each node of the network remains aware of, e.g., all the issues
open on any clone of that project? This is what federation is really
about, and why decentralisation and distribution are not enough. I’m
very hopeful for the efforts of several projects (notably
Forgejo’s);
I hope that in the near future it will be possible to fork a repository
that lives in a different installation of potentially a different
repository management software and feed back changesets, issues, etc.,
as seamlessly as we can now search for a Mastodon user, answer a toot,
etc.
By the way, if you would like to comment this post, please do so by
replying to this toot5.
Acknowledgements
I thank Zassenhaus for discussions and for reading a draft of this post.
I extendedly wrote about part of my setup
here and
here. ↩︎
Although any repository can serve as a valid remote, that
doesn’t mean that it is a good idea. In particular, server
repositories should be
bare. ↩︎
Two more hooks exist on the server side, cf. Git
book. ↩︎
git-issue and
git-bug are two notable
projects whose goal is to decentralise and distribute the issue
tracker. ↩︎