Skip to content

Instantly share code, notes, and snippets.

@arubdesu
Last active August 29, 2015 14:10
Show Gist options
  • Select an option

  • Save arubdesu/af49ccd23630ae76c983 to your computer and use it in GitHub Desktop.

Select an option

Save arubdesu/af49ccd23630ae76c983 to your computer and use it in GitHub Desktop.
afp548 draft on git-fat, part one - intro
Do you work on munki with a team of folks (meaning more than one)? You probably wish you could use MunkiAdmin for everything, but the server world is turning *nix, and MunkiAdmin is only meant to be super-efficient running on your Mac (whereas mounting the munki repo over the network may be a poor experience as things grow very large). It's sheer luck that Hannes Juutilainen wants to help with all of our workflow needs, so he composed the original page on git usage for the official munki wiki. Git was made for a distributed, (latent) network-aware world, but as the wiki page takes into account by instructing you to create a .gitignore file, it wasn't made to shuttle around large, compressed containers like DMGs and pkgs. If you're pushing 1GB+ disk images with git as the transport mechanism, with every version copied to it's internal storage... <'you're going to have a bad time' South Park meme goes here>
Let's take a step back. In an idealized view of things, you could 1. check out the current version of the 'stable' or 'production' repository (if you have multiple 'environments', meaning git branches, before munki changes hit production) in a timely fashion, 2. use any text editor or client like MunkiAdmin to add/modify the munki repo and 3. push things back up to an internal git 'remote', alerting the responsible admin to review and merge changes before 'pulling' it into 'stable' on your production vhost. If you run makecatalogs on a repo without the pkgs, however, you're greeted with a bunch of warnings, and syncing all of them down could be impractical if not time-consuming.
Right off the bat before I go on, some solutions exist which are definitely worth evaluating, like Mandrill by Joe Wollard. It presents an entirely web-based, access-controlled, one-stop-shop to allow multiple folks to collaborate with versioning 'built-in'. MunkiServer has been around for a bit, and similarly allows multi-group collaboration. Sal+, by PebbleIT and actively developed by Graham Gilbert, has added repo modifications to the hosted/paid/supported version. Simian, by Google, is also hosted but takes a different approach while still being similarly self-contained and feature-rich.
But if you're into putting together the pieces yourself, a solution to the problem of 'how do I move these packages around before/once I've made a change?' could be git hooks. Upon checkout, or after commiting a change to any particular branch and pushing, you can tell git to run rsync. Emailing the team upon a change is one of the tips Hannes included in the munki wiki page, but just like you need to figure out that implementation, you'd want to come up with the process of tuning rsync to perform how you'd like. That alone wasn't an attractive option for me, however, as I was sure solutions for this issue of large binary blobs in git had to have been attempted already. One of those that I experimented with is git-annex, which is... crazy. Crazy POWERFUL, but still crazy. My reservations about it was that it leaves the one copy of everything in your internal .git repo, with symlinks where you'd be serving the actual files. I didn't want to have to think about tuning the overall performance of symlinked files in the munki repo, so this wasn't attractive for me.
Sam Keeley clued me in (as he is the source of many of my better practices) on git-fat, another way to tackle this problem. There are barely any dependencies, and its implementation checks off the first two steps in the collaboration workflow I mentioned above. An admin can check out the git repo in seconds (even though we're well over 10GBs in pkgs with hundreds of high-quality png icons/client reources), and there are placeholder files that trick makecatalogs into running without warnings. In the words of Pee Wee's Big Adventure though, why does there always have to be a big 'but' - git-fat duplicates the files into its internal .git/fat/objects directory once you 'pull' the fat files in. This would make things ungainly and wasteful on the web host storage we serve munki from. I scratched my head and went to battle with git's --filter-branch options, thinking I could clean up the objects post-pull, but git-fat still had difficulty purging the old versions. I looked at building the feature I needed into git-fat's source, but it sunk my scrabbleship (the code, to my still novice python eyes, made no sense).
Git hooks came to the rescue again, however, as you can specify a command to run post-merge (appropriate because git pull actually runs a merge internally as its last step once fetched): in this case, on this one host, I just wanted to overwrite the internal objects git-fat uses for accounting to zero bytes, but leave them as placeholders before copying in the new changes via 'git fat pull'.
(Hint: for i in .git/fat/objects/*; do chmod 700 $i; cp /dev/null $i; done )
In the next article, I'll walk though the steps I used to convert an existing repo with the icons/ pkgs/ and client_reources/ .gitignored to then incorporate git-fat managing everything for us.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment