Keeping your gits in a row

Posted (last modified Tue 18 June 2024) in archiving git bash

I believe that if you use a piece of code, you are also responsible for making sure that that code is available in the future.

In this spirit, I decided a couple of years ago that I would keep a full clone of all VCS repositories that I use.

Can't someone else do it?

Yeah, yeah, I hear ya.

But imagine that one day you cannot reach the code repository anymore.

It could be because you are working where internet is scarce or impossible to rely on.

It could be that you have to cope with what was in your faraday cage when a giant solar flare happened.

It could be that you, or the author of the code, have been cut off by the accelerating weaponization of everything.

Or maybe none of the above happened. But you still understand and appreciate what it means to build a truly decentralized society, where we all participate and contribute, not only consume.

Git organized

For every git repository that I use, I actually keep a local copy on my daily device.

I also keep a copy on a device at home, and on a remote device.

My thinking is:

  1. If I lose my laptop, I have two copies
  2. If my house burns down, I have two copies
  3. If my house burns down with my laptop inside, I have at least one more copy.

... and so on.

I hate to move it, move it

Sometimes we have to, though,.

And what can be a real pain is to move heaps of code repositories around. For example if you are moving to a new machine, or want to bootstrap a new copy without having to source the data yourself.

To make this easier, I wrote the gitrefresh bash tool to copy only the minimum of information required to source the data from a remote. [1]

Freshening up

To make sense of what is what in the repository store, I use a simple folder structure.

Obviously, when I create copies of the repository store, I would like to keep the same folder structure. So the tool needed to make that possible.

Additionally, what's needed are tools to bootstrap a repository group from a list, and a tool to refresh those repositories periodically once they've been bootstrapped.

To achieve this, I actually wrote three tools, as follows:

gitlist.sh

create a list of git repositories under a filesystem path, with the option of preserving the directory structure.

gitstart.sh

clone git repositories from a list generated from gitlist.sh, with or without direcory structure.

gitrefresh.sh

fetch and merge updates from remotes of each repository under a directory.

Behavior

The gitlist.sh and gitrefresh.sh tools work more or less the same way.

They traverse a directory structure recursively.

Every time a valid git repository is found, that repository is processed. Afterwards, the tool will exit to the parent folder. [2]

Example

Let's say we have three repositories that we are mirroring locally:

  • https://github.com/bitcoin/bips under btc/bips
  • https://aur.archlinux.org/libkeccak.git under os/archlinux/aur/libkeccak
  • git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git under linux/linux

First we use gitlist.sh to generate the list of repos to bootstrap [3]:

$ gitlist.sh -p | tee gitlist.txt
https://github.com/bitcoin/bips btc/bips
https://aur.archlinux.org/libkeccak.git os/archlinux/aur/libkeccak`
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git  linux/linux`

Using gitstart.sh with this list, we can restore this bunch of repositories with the same directory structure anywhere else:

$ cd /path/to/new/repos/location
$ gitstart.sh < gitlist.txt

Now, the idea is that from time to time you should get the latest changes from the upstream source.

I simply combine gitrefresh.sh with cron to do this on the remote, while manually doing the refresh locally once in awhile.

Using the tool, all it takes is:

$ cd /path/to/new/repos/location
$ gitrefresh.sh pull
[1]Yes. I didn't get beyond git yet. But at least it's a start.
[2]This, of course, means that the tool will not automatically archive code from submodules. The submodule construct is a target of both a lot of love and a lot of hate. Personally, I like it. But at the same time it is my opinion that it does not absolve us from knowing and being mindful which submodules a repository is using, and thus making sure that we have an independent clone of that repository.
[3]We add the -p flag to preserve the directory structure on disk.