Technology Solutions for Everyday Folks

Git-ting the hang of Hooks

Several months ago I made the decision to finally use Git to manage a more complex "legacy" web application project that I inherited over a decade ago and continue to maintain. Years ago when migrating the web application to a new hosting stack, I ported the development/test and production sites into their own Git repositories. Thanks to the magic of GitHub, it made the actual migration to a new host pretty simple in that the migration itself (save for the database bits) was more or less a simple git clone action. After the migration, I'd intended to merge the dev/test repo into the production repo and then use Git's branching and merging to handle all further development. I ran into a hangup, however, due to how the legacy code was configured: there wasn't a particularly good way to untangle some of the app configuration includes. And so for much longer than I'd care to admit I used the dual repository model to track changes which was better than nothing...but far from ideal. Pushing to the production host literally meant copying over specific files...and remembering which files were more...complex (those containing host-specific configurations).

What Was So Unique? Not Passwords, I Hope?!

Absolutely not. The problem in question wasn't about sensitive data (passwords, secrets, keys, etc.), which shouldn't be stored in a repo. Almost all of the differences were path and hostname specific bits sprinkled with a few contact settings that differ between the dev/test and production environments. This was especially problematic for a couple of included libraries/plugins as those configurations live deep in the source. All told, there are exactly six files that differ in substantial ways between the two environments, so it was just "easier" to ignore the problem. Until it wasn't.

What Changed? Why Tackle This?

Fixing the root problem really requires a substantial effort in re-architecting some of the app which isn't worth the effort for six files. It's complicated to write out exactly why as it's easier shown, but to cite a former supervisor: "Broken gets fixed; shoddy lasts forever." This isn't broken, it's just poorly designed. A proper fix isn't worth the effort, so it's "better" to find a workaround.

The need of a workaround finally came to a head when I needed to make a substantial change throughout the application stack to address a deprecated library, which I decided to tackle this past February. The changes themselves weren't complicated, but as they were embedded throughout the application codebase it was not going to be feasible to remember which things were in which state between the two repos. It was time to bite the bullet and merge the repos. Except I had to figure out how to handle those six files.

Enter: The Hook!

I consulted The Google for advice on how to do something like this. After a couple of failed starts with Git merge strategies, I discovered Git Hooks. And suddenly it became obvious, just as it's written in the Git book on Git Hooks:

The post-merge hook runs after a successful merge command. You can use it to restore data in the working tree that Git can’t track, such as permissions data. This hook can likewise validate the presence of files external to Git control that you may want copied in when the working tree changes.

This is exactly what I needed, since a git pull is always a git merge. By having a post-merge hook always copy in the requisite/applicable files based on the branch, I could solve my issue.

The next challenge was to make the hook behave the way I need, and to do so consistently without having to remember anything. Hooks are clone-specific, and not part of the repository managed in GitHub, so I could just manage each environment's hooks separately. However, in addition to the two primary host environments for this app: dev/test and production, tracking with the dev and main branches respectively and never changing, I also have a localized dev environment in which all the code is written and where I often switch branches. It made sense given the "three disparate hosts" to include the hook code in the repo.

How Does A Git Hook Work?

At its core, a Git Hook is nothing more than a script trigger on an action. So for post-merge, after a merge is completed the script/code in .git/hooks/post-merge is executed. It could be used as a friendly text reminder of how awesome you are:

#!/bin/bash
printf '\nPost-merge hook complete. You know how to rock Git!\n\n'

The reality is that a friendly message, self-affirming as it may be, actually doesn't do anything meaningful with the code. However, anything you can do in Bash can also be done in a Hook. Enter major possibilities!

I decided to use a two-step process to copy the files. The first step is the Hook logic, and the second step contains the array/list of files to copy and the copy command. I chose a two script method so I only have one "association" file (the list) to maintain. This is helpful since my local development environment uses multiple Hooks (post-checkout being a key player) to trigger a copy, All of these scripts were added to a deployutils directory in the repository, and the Hooks can then be easily copied to their clone location (a one-time necessity per clone).

The actual post-merge Hook used on the dev/test and production hosts looks a lot like this:

#!/bin/bash
repodir=$(pwd)
printf '\nRunning post-merge hook...\n'
# Get the current branch name 
branch_name=$(git branch | grep "*" | sed "s/\* //")
# Are we on "main"?
if  [[ $branch_name = "main" ]]; then
	printf '\n  ==> Refreshing PRODUCTION code environment files:\n'
	$repodir/deployutils/stageFilesByEnv.sh prd
fi
# Are we on "dev"?
if [[ $branch_name = "dev" ]]; then
	printf '\n  ==> Refreshing DEVELOPMENT code environment files:\n'
	$repodir/deployutils/stageFilesByEnv.sh dev
fi
printf '\nPost-merge hook complete.\n\n'

This script will only act when checked out to specific branches, which then ensures the two hosts always get their proper configurations even if I mess up (or more likely, forget) something. It does nothing when invoked on other branches (feature, bugfix).

The actual list and copying happens in the stageFilesByEnv.sh script, which looks a lot like this:

#!/bin/bash
repodir=$(pwd)
declare -A FILELIST
if [[ "prd" = "$1" ]];
then # We're called in the "prd" environment
  FILELIST[$repodir/path/to/file/one]=$repodir/path/to/file/one.prd
  FILELIST[$repodir/path/to/file/two]=$repodir/path/to/file/two.prd
else # Assume "dev" environment by default
  FILELIST[$repodir/path/to/file/one]=$repodir/path/to/file/one.dev
  FILELIST[$repodir/path/to/file/two]=$repodir/path/to/file/two.dev
fi
for K in "${!FILELIST[@]}"
do
  echo "    ${FILELIST[$K]} --> $K";
  cp "${FILELIST[$K]}" "$K";
done;

Add the files at $repodir/path/to/file/one and $repodir/path/to/file/two to the .gitignore file, and keep the .dev and .prd versions wherever (I chose to keep them in their relative locations for simplicity).

A Note About The "First Time"

In the switch to this process, and because I kept the Hook and script files in the repo for distribution, the first pull after this change has to be handled carefully. I had renamed the files in the same commit that pulled over the new Hook and script files, so that first pull caused a temporary "outage" of sorts in my environments...because the configuration files had been moved/renamed in alignment with the repository (thus deleted and not available). I knew this would happen, and fixed it in short order by copying in the Hook and manually triggering it, but something to be aware of.

Once the Hooks are set up, though, it works beautifully. My dev/test and production hosts are almost always consumers of changes (e.g. git pull only), so each pull automatically refreshes the appropriate files, and I no longer have to worry about remembering if I ever fiddled with one of those special files...or if I accidentally moved it to the wrong host.

A Little "Outside The Box"

Admittedly my situation here is a bit unique and outside the box, but the fact I can make Git behave in this way really saves me a boatload of time and mental effort. It's also meant that I am now actually using Git in a complex project and leveraging its built-in features (and branches and all of the other good bits) to more properly manage this project going forward. One repository!

It also makes me feel a bit like a Git magician as this process is a bit like a poor man's CD strategy. While it's not true -- I am decidedly not a magician -- I'm very happy with this advancement and Current Me has already been very thankful for Past Me's changes!