Useful git snippets and links

Git Snippets

A collection of useful git snippets and links for sharing. They're in no particular order.

Aliasing and graphs
Global Ignore
Good commit messages are important
Learn Git Branching
Worktrees
Sparse Checkout
Commit Graphs
Git Bisect
- git bisect run
Combining branches
I hate monorepos, here's everything I've found that makes them less painful

Aliasing and graphs

You can add aliases for often used commands, and have them still use git's completion features!

# git config alias.<alias> <command to expand to>
git config alias.pu push -u origin

You can also add them to ~/.gitconfig directly. These are my favorites, they show a fancy commit graph in your terminal.

[alias]
    graph = log --oneline --graph --all --pretty=format:\"%C(auto)%h%d %s%C(reset) %C(green)(%ar)%C(reset)\"
	full-graph = log --all --graph --decorate --pretty=format:\"%C(auto)%h%C(reset) - %C(cyan)%s%C(reset) %C(green)(%ar)%C(reset)%C(auto)%d%C(reset)%n%C(bold black) Author: %an <%ae> %ad%n%C(bold black) Committer %cn <%ce> %cd%C(reset)\"

Global Ignore

Are you on a mac and want to stop seeing untracked .DS_Store? Or on windows with desktop.ini? Or vim's swp files? emac's backups? __pycache__?

You can configure a global gitignore file. This and other useful tidbits can be found in the manpage. I like to use ~/.gitignore, and you can find mine in my dotfiles.

git config core.excludesfile ~/.gitignore

Good commit messages are important

I'll defer to Chis Beams for how to write them, so go read this blog post.

TL;DR (shame on you go read it): commit messages communicate context, and good ones help document a project's history and why changes were made.

Learn Git Branching

Checkout learngitbranching.js.org, which gives you an interactive visual sandbox and takes you through a variety of branching scenarios. It's real helpful to understand what each command is doing.

Worktrees

I've run into a number of the following:

Working on multiple things at the same time, without messy stashing
Conflicting builds (e.g. building foo messes up the cache for bar, or you need to be able to run prod and dev in short order)
Needing a clean workdir without having to check in or delete your untracked and ignored build artifacts
Running multiple git bisect at the same times (I haven't actually done this, but refs/bisect aren't shared across worktrees so it should be doable)

All of these I've found as use cases for git worktree. As always, check the man pages, but TL;DR:

git worktree allows you to manage multiple working trees attached to the same repo. What's a working tree? It's the place where all your files are checked out.

# Create a new worktree at ~/Git/repo-tree-2, with a new branch `repo-tree-2` checked out.
git worktree add ~/Git/repo-tree-2

git worktree list

git worktree remove

Sparse Checkout

When you're in a really large repository, some operations that have to walk the worktree can get slow. If you only need some of the files, git sparse-checkout can be used to cut down on how much you keep checked out. I've been using a sparse checkout in one mono repo that keeps 14% of tracked files, and it had a small but noticable improvement on the performance of things like git status. By default, sparse checkout uses the same pattern syntax as gitignore.

Note that at present, sparse checkouts are experimental, so heed the warning in the manpages: THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE.

Here's what a sparse checkout might look like. I'm using tree to show the state of the working tree after each change, so this example is kinda long, but well illustrated.

.
├── app.py
├── config.env.py
├── eac
│   ├── __init__.py
│   ├── static
│   │   ├── actions.js
│   │   ├── logos
│   │   │   ├── github.svg
│   │   │   ├── slack.svg
│   │   │   ├── twitch.svg
│   │   │   └── twitter.svg
│   │   └── style.css
│   └── templates
│       ├── callback.html
│       └── home.html
├── LICENSE
├── README.md
└── requirements.txt

$ git sparse-checkout init # Setup the sparse checkout config and populate it with default rules
.
├── app.py
├── config.env.py
├── LICENSE
├── README.md
└── requirements.txt

$ git sparse-checkout list # List the current sparse checkout patterns
# These are the defaults populated by init.
/*   # All files in the root of the repo
!/*/ # No subdirectories

$ git sparse-checkout set '*.svg'
.
└── eac
    └── static
        └── logos
            ├── github.svg
            ├── slack.svg
            ├── twitch.svg
            └── twitter.svg

$ git sparse-checkout list
*.svg

$ git sparse-checkout add '*.py'
.
├── app.py
├── config.env.py
└── eac
    ├── __init__.py
    └── static
        └── logos
            ├── github.svg
            ├── slack.svg
            ├── twitch.svg
            └── twitter.svg

# Do some work that causes merge conflicts to unsparsify other files e.g.
.
├── app.py
├── config.env.py
├── eac
│   ├── __init__.py
│   └── static
│       ├── actions.js
│       ├── logos
│       │   ├── github.svg
│       │   ├── slack.svg
│       │   ├── twitch.svg
│       │   └── twitter.svg
│       └── style.css
└── requirements.txt

$ git sparse-checkout reapply
.
├── app.py
├── config.env.py
└── eac
    ├── __init__.py
    └── static
        └── logos
            ├── github.svg
            ├── slack.svg
            ├── twitch.svg
            └── twitter.svg

$ git sparse-checkout disable
.
├── app.py
├── config.env.py
├── eac
│   ├── __init__.py
│   ├── static
│   │   ├── actions.js
│   │   ├── logos
│   │   │   ├── github.svg
│   │   │   ├── slack.svg
│   │   │   ├── twitch.svg
│   │   │   └── twitter.svg
│   │   └── style.css
│   └── templates
│       ├── callback.html
│       └── home.html
├── LICENSE
├── README.md
└── requirements.txt

Commit Graphs

Git has a file called commit-graph, which stores supplemental data to make operations that traverse the commit graph (e.g. log, cherry, rev-list, push) faster. As of git 2.31.1, this file is enabled by default, and updated when git gc is run. You can enable fetch.writeCommitGraph to make these operations a little faster, at the expense of slightly slower fetches, since gc is only run periodically. In some older versions of git, this is not enabled by default. The commands to enable it are below.

git config core.commitGraph true
git config fetch.writeCommitGraph true
git config gc.writeCommitGraph true

Git Bisect

Heck bisect is neat. TL;DR: git bisect uses binary search to find a commit. This is really useful for finding the commit that introduced a bug, or really any change in a repository. Though, the man page for this is pretty good, so I'd recommend it.

My quick guide to bisecting:

$ git bisect start
$ git bisect bad <some commit-ish>  # Mark the specified commit as bad (or leave blank to mark HEAD)
$ git bisect good <some commit-ish> # Mark the specified commit as good (or leave blank to mark HEAD)

From here, git will select a commit and check it out for you to test. It will also print out how much there is left to test. For each commit bisect checks out, mark it with either git bisect good or git bisect bad. You can also label an arbitrary commit git bisect good|bad <commit-ish>, or checkout another commit and test that, if the commit bisect picked isn't interesting enough. This is also useful if a commit is untestable, e.g. it doesn't compile. You can even tell bisect that this commit isn't useful, and ask it to pick a different one with git bisect skip.

`git bisect run`

One of my favorite features is that you can use git bisect run to automate testing and finding a bug

git bisect run test_bug.sh # and then git bisect will just find it for you

There are some more details about how a script should be set up, those can be found in the man pages and git documentation.

Combining branches

`git merge`

Combine 2 (or more) branches by creating a merge commit. In a graph view, this looks something like this:

    *---*---*
           /
    --*---*

Initiate a merge with git merge <commit>..., i.e.


    *---* main # current branch
     \     
      *---* feature

git merge feature

    *---*---* main # current branch
     \     /
      *---* feature

Git will try to reconcile any files that were changed in both branches, but if it can't, or there are conflicting changes, git will present you with the merge conflict and ask for you to resolve them. The git documentation covers how git formats conflicts, and how you can resolve them. They may seem scary at first, but merge conflicts are perfectly solvable, it might just take a minute to get used to the syntax.

`git cherry-pick`

Apply some commits to the tip of your current branch.

Running git cherry-pick A B C will apply the changes from each specified commit and make a new commit (with the same message) sequentially, resulting in:

*---A'--B'--C' current-branch
 \           \
  \           HEAD
   Previous HEAD

By default, git cherry-pick won't apply redundant commits.

Note that cherry picking can introduce merge commits, and you'll need to refer to the git merge documentation for how those are handled.

Cherry picking is most useful when you need only some of the changes from a specific branch, such as backporting patches from a release branch to a maintenance branch.

`git rebase`

Move a branch on top another branch.

Given an arrangement like this:

          D---E---F feature-1
         /
A---B---C main
         \
          G---H---I feature-2

git checkout feature-2; git rebase feature-1 will result in:

                    G'--H'--I' feature-2
                   /
          D---E---F feature-1
         /
A---B---C main

git rebase supports some other fancy operations, especially with the --interactive flag. Maybe I'll touch on some of that here later.

Side note - GitHub's 'merge' options

All the command demos here treat main as the default branch, and feature as the pull request branch.

I personally prefer Create a merge commit or Rebase and merge, as they make it possible to still follow the git history without having to open GitHub in a browser.

Create a merge commit

GitHub creates a standard merge commit, though if you look at the command line instructions, they use the --no-ff flag to force git merge to create a merge commit, even if it isn't necessary. The relevant command line instructions are:

git checkout main
git merge --no-ff feature

Squash and 'merge'

This isn't actually a merge. GitHub creates a squashed commit, containing all of the changes from all of the commits in the PR. You can do similarly on the command line with:

git checkout main
git merge --squash feature
git commit

Rebase and 'merge'

This isn't a merge either. It's just a rebase

git checkout feature
git rebase main

# This just updates main to the rebased branch, the ff-only flag guarantees git won't create a merge commit.
git checkout main
git merge --ff-only feature

I hate monorepos, here's everything I've found that makes them less painful

To be clear, I don't hate monorepos per se. I think monorepos can make a lot of sense. Good example: tightly coupled codebases. The frontend and backend of a web app often have related changes, so those seem like reasonable candidates for keeping together. Bad example: putting all of your company's code in one repo (note: this doesn't apply to very small companies working on one project, it's more about large corporations with hundreds or thousands of engineers working on code that has no relation or coupling to other parts of the repo). Git isn't especially good at handling abundantly large repos after a point (to be honest, your filesystem might complain too), and what's the point if all of your code isn't getting compiled together? It just adds noise and bulk.

Anyways, enough of a digression. The following is a list of git features and tools that I regularly employ to ease the burden of working in monorepos.

`git sparse-checkout`

I've talked about sparse-checkout before, (see Sparse Checkout), but it really is quite helpful in well structured repos. In a company I worked for, my team's code lived in a subdirectory, so I enabled sparse checkout. Now I only have that one directory, and the top level files, which comes out to about 2% of the repo. It's a much smaller file tree, and so a number of operations see performance boosts.

`git worktree`

Another thing I've talked about elsewhere (see Worktrees). Last I wrote about them, I didn't use them that much, but since then I've worked a whole bunch in a monorepo, and normally I've had more than one thing in progress at any given time. I do use git stash, but that only gets me so far, and the build system that was used in that repo was sensitive to a whole bunch of things, so it was simpler to have multiple worktrees (plus if I don't have a separate terminal for something, I will forget it exists). This means I can have multiple branches being actively reviewed and not have to spend time checking out, waiting, then switching back. Also very convienient for the frequent tangents I take. I can just experiment and not worry about having to clean up my repo later.

`git maintenance`

git maintenance can be used to simplify running a number of repository maintenance tasks, or even to automate them. These tasks are mostly to do with the internals of the git repo, but they can have some pretty significant impacts. git maintenance register is all you need to run to get things running (see the docs or man git-maintenance to see what the defaults are).

The tasks I think are most important for speeding up a local monorepo:

commit-graph: commit-graph files store metadata that help speed up any operation that needs to walk commits, which is a lot of them, and having them be incrementally and automatically updated keeps them a whole lot fresher than if they only get written when git gc is run, and in some workflows, possibly also git fetch.
- TL;DR: it makes a bunch of things faster
prefetch: when this task is run, git will fetch refs from all remotes, and store all updates under refs/prefetch/. This allows future fetches to fetch less, since the repo will already have more of the objects that would need to be fetched, while still avoiding disrupting or changing any references the user would expect to be stable.
- TL;DR: it makes fetches faster

Other tasks:

gc: This runs git's garbage collection to shrink repo sizes. Note that this can be disruptive and remove objects or refs that would be used for recovery (i.e. old reflog entries). See man git-gc or the docs for more. This is not enabled by default.
loose-objects: This task packs some loose objects. Exactly what those are is a discussion for another update to this doc, but since packs are how git does compression, this can reduce repository size on disk. Note that the docs caution against enabling both this and gc at the same time.
- TL;DR: it makes repositories smaller
incrementa-repack: this task incrementally repacks several small pack files. This can improve opportunities for the pack operation to make the most efficient deltas (git uses 'delta compression' in packs. also a separate topic).
- TL;DR: it makes repositories smaller
pack-refs: Git stores refs as files in a directory tree. This can be expensive to get a list of, so pack-refs puts all the refs into a single file, which makes iteration across all of them faster.

`git config core.fsmonitor`

This enables the filesystem monitor daemon in the current working directory when set to 'true'. This makes commands like status or other things that update the index faster. It's not available on every platform, so check man git-config. You can supply a command to do the work of the fsmonitor in those cases.

`git config feature.manyFiles`

In repos with a lot of files present on disk, this sets defaults to improve performance.

mxmeinhold/git-snippets.md