A collection of useful git snippets and links for sharing. They're in no particular order.
- Aliasing and graphs
- Global Ignore
- Good commit messages are important
- Learn Git Branching
- Worktrees
- Sparse Checkout
- Commit Graphs
- Git Bisect
- Combining branches
- I hate monorepos, here's everything I've found that makes them less painful
You can add aliases for often used commands, and have them still use git's completion features!
# git config alias.<alias> <command to expand to>
git config alias.pu push -u origin
You can also add them to ~/.gitconfig
directly.
These are my favorites, they show a fancy commit graph in your terminal.
[alias]
graph = log --oneline --graph --all --pretty=format:\"%C(auto)%h%d %s%C(reset) %C(green)(%ar)%C(reset)\"
full-graph = log --all --graph --decorate --pretty=format:\"%C(auto)%h%C(reset) - %C(cyan)%s%C(reset) %C(green)(%ar)%C(reset)%C(auto)%d%C(reset)%n%C(bold black) Author: %an <%ae> %ad%n%C(bold black) Committer %cn <%ce> %cd%C(reset)\"
Are you on a mac and want to stop seeing untracked .DS_Store
? Or on windows with desktop.ini
? Or vim's swp files? emac's backups? __pycache__
?
You can configure a global gitignore file. This and other useful tidbits can be found in the manpage.
I like to use ~/.gitignore
, and you can find mine in my dotfiles.
git config core.excludesfile ~/.gitignore
I'll defer to Chis Beams for how to write them, so go read this blog post.
TL;DR (shame on you go read it): commit messages communicate context, and good ones help document a project's history and why changes were made.
Checkout learngitbranching.js.org, which gives you an interactive visual sandbox and takes you through a variety of branching scenarios. It's real helpful to understand what each command is doing.
I've run into a number of the following:
- Working on multiple things at the same time, without messy stashing
- Conflicting builds (e.g. building
foo
messes up the cache forbar
, or you need to be able to run prod and dev in short order) - Needing a clean workdir without having to check in or delete your untracked and ignored build artifacts
- Running multiple
git bisect
at the same times (I haven't actually done this, butrefs/bisect
aren't shared across worktrees so it should be doable)
All of these I've found as use cases for git worktree
. As always, check the man pages, but TL;DR:
git worktree
allows you to manage multiple working trees attached to the same repo.
What's a working tree? It's the place where all your files are checked out.
# Create a new worktree at ~/Git/repo-tree-2, with a new branch `repo-tree-2` checked out.
git worktree add ~/Git/repo-tree-2
git worktree list
git worktree remove
When you're in a really large repository, some operations that have to walk the worktree can get slow. If you only need some of the files, git sparse-checkout
can be used to cut down on how much you keep checked out. I've been using a sparse checkout in one mono repo that keeps 14% of tracked files, and it had a small but noticable improvement on the performance of things like git status
. By default, sparse checkout uses the same pattern syntax as gitignore
.
Note that at present, sparse checkouts are experimental, so heed the warning in the manpages:
THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE.
Here's what a sparse checkout might look like. I'm using tree
to show the state of the working tree after each change, so this example is kinda long, but well illustrated.
.
├── app.py
├── config.env.py
├── eac
│ ├── __init__.py
│ ├── static
│ │ ├── actions.js
│ │ ├── logos
│ │ │ ├── github.svg
│ │ │ ├── slack.svg
│ │ │ ├── twitch.svg
│ │ │ └── twitter.svg
│ │ └── style.css
│ └── templates
│ ├── callback.html
│ └── home.html
├── LICENSE
├── README.md
└── requirements.txt
$ git sparse-checkout init # Setup the sparse checkout config and populate it with default rules
.
├── app.py
├── config.env.py
├── LICENSE
├── README.md
└── requirements.txt
$ git sparse-checkout list # List the current sparse checkout patterns
# These are the defaults populated by init.
/* # All files in the root of the repo
!/*/ # No subdirectories
$ git sparse-checkout set '*.svg'
.
└── eac
└── static
└── logos
├── github.svg
├── slack.svg
├── twitch.svg
└── twitter.svg
$ git sparse-checkout list
*.svg
$ git sparse-checkout add '*.py'
.
├── app.py
├── config.env.py
└── eac
├── __init__.py
└── static
└── logos
├── github.svg
├── slack.svg
├── twitch.svg
└── twitter.svg
# Do some work that causes merge conflicts to unsparsify other files e.g.
.
├── app.py
├── config.env.py
├── eac
│ ├── __init__.py
│ └── static
│ ├── actions.js
│ ├── logos
│ │ ├── github.svg
│ │ ├── slack.svg
│ │ ├── twitch.svg
│ │ └── twitter.svg
│ └── style.css
└── requirements.txt
$ git sparse-checkout reapply
.
├── app.py
├── config.env.py
└── eac
├── __init__.py
└── static
└── logos
├── github.svg
├── slack.svg
├── twitch.svg
└── twitter.svg
$ git sparse-checkout disable
.
├── app.py
├── config.env.py
├── eac
│ ├── __init__.py
│ ├── static
│ │ ├── actions.js
│ │ ├── logos
│ │ │ ├── github.svg
│ │ │ ├── slack.svg
│ │ │ ├── twitch.svg
│ │ │ └── twitter.svg
│ │ └── style.css
│ └── templates
│ ├── callback.html
│ └── home.html
├── LICENSE
├── README.md
└── requirements.txt
Git has a file called commit-graph
, which stores supplemental data to make operations that traverse the commit graph (e.g. log, cherry, rev-list, push) faster. As of git 2.31.1, this file is enabled by default, and updated when git gc
is run. You can enable fetch.writeCommitGraph
to make these operations a little faster, at the expense of slightly slower fetches, since gc
is only run periodically. In some older versions of git, this is not enabled by default. The commands to enable it are below.
git config core.commitGraph true
git config fetch.writeCommitGraph true
git config gc.writeCommitGraph true
Heck bisect is neat. TL;DR: git bisect
uses binary search to find a commit. This is really useful for finding the commit that introduced a bug, or really any change in a repository. Though, the man page for this is pretty good, so I'd recommend it.
My quick guide to bisecting:
$ git bisect start
$ git bisect bad <some commit-ish> # Mark the specified commit as bad (or leave blank to mark HEAD)
$ git bisect good <some commit-ish> # Mark the specified commit as good (or leave blank to mark HEAD)
From here, git will select a commit and check it out for you to test. It will also print out how much there is left to test.
For each commit bisect
checks out, mark it with either git bisect good
or git bisect bad
. You can also label an arbitrary commit git bisect good|bad <commit-ish>
, or checkout another commit and test that, if the commit bisect picked isn't interesting enough. This is also useful if a commit is untestable, e.g. it doesn't compile. You can even tell bisect that this commit isn't useful, and ask it to pick a different one with git bisect skip
.
One of my favorite features is that you can use git bisect run
to automate testing and finding a bug
git bisect run test_bug.sh # and then git bisect will just find it for you
There are some more details about how a script should be set up, those can be found in the man pages and git documentation.
Combine 2 (or more) branches by creating a merge commit. In a graph view, this looks something like this:
*---*---*
/
--*---*
Initiate a merge with git merge <commit>...
, i.e.
*---* main # current branch
\
*---* feature
git merge feature
*---*---* main # current branch
\ /
*---* feature
Git will try to reconcile any files that were changed in both branches, but if it can't, or there are conflicting changes, git will present you with the merge conflict and ask for you to resolve them. The git documentation covers how git formats conflicts, and how you can resolve them. They may seem scary at first, but merge conflicts are perfectly solvable, it might just take a minute to get used to the syntax.
Apply some commits to the tip of your current branch.
Running git cherry-pick A B C
will apply the changes from each specified commit and make a new commit (with the same message) sequentially, resulting in:
*---A'--B'--C' current-branch
\ \
\ HEAD
Previous HEAD
By default, git cherry-pick
won't apply redundant commits.
Note that cherry picking can introduce merge commits, and you'll need to refer to the git merge
documentation for how those are handled.
Cherry picking is most useful when you need only some of the changes from a specific branch, such as backporting patches from a release branch to a maintenance branch.
Move a branch on top another branch.
Given an arrangement like this:
D---E---F feature-1
/
A---B---C main
\
G---H---I feature-2
git checkout feature-2; git rebase feature-1
will result in:
G'--H'--I' feature-2
/
D---E---F feature-1
/
A---B---C main
git rebase
supports some other fancy operations, especially with the --interactive
flag. Maybe I'll touch on some of that here later.
All the command demos here treat main
as the default branch, and feature
as the pull request branch.
I personally prefer Create a merge commit
or Rebase and merge
, as they make it possible to still follow the git history without having to open GitHub in a browser.
GitHub creates a standard merge commit, though if you look at the command line instructions, they use the --no-ff
flag to force git merge
to create a merge commit, even if it isn't necessary.
The relevant command line instructions are:
git checkout main
git merge --no-ff feature
This isn't actually a merge. GitHub creates a squashed commit, containing all of the changes from all of the commits in the PR. You can do similarly on the command line with:
git checkout main
git merge --squash feature
git commit
This isn't a merge either. It's just a rebase
git checkout feature
git rebase main
# This just updates main to the rebased branch, the ff-only flag guarantees git won't create a merge commit.
git checkout main
git merge --ff-only feature
To be clear, I don't hate monorepos per se. I think monorepos can make a lot of sense. Good example: tightly coupled codebases. The frontend and backend of a web app often have related changes, so those seem like reasonable candidates for keeping together. Bad example: putting all of your company's code in one repo (note: this doesn't apply to very small companies working on one project, it's more about large corporations with hundreds or thousands of engineers working on code that has no relation or coupling to other parts of the repo). Git isn't especially good at handling abundantly large repos after a point (to be honest, your filesystem might complain too), and what's the point if all of your code isn't getting compiled together? It just adds noise and bulk.
Anyways, enough of a digression. The following is a list of git features and tools that I regularly employ to ease the burden of working in monorepos.
I've talked about sparse-checkout
before, (see Sparse Checkout), but it really is quite helpful in well structured repos. In a company I worked for, my team's code lived in a subdirectory, so I enabled sparse checkout. Now I only have that one directory, and the top level files, which comes out to about 2% of the repo. It's a much smaller file tree, and so a number of operations see performance boosts.
Another thing I've talked about elsewhere (see Worktrees). Last I wrote about them, I didn't use them that much, but since then I've worked a whole bunch in a monorepo, and normally I've had more than one thing in progress at any given time. I do use git stash
, but that only gets me so far, and the build system that was used in that repo was sensitive to a whole bunch of things, so it was simpler to have multiple worktrees (plus if I don't have a separate terminal for something, I will forget it exists). This means I can have multiple branches being actively reviewed and not have to spend time checking out, waiting, then switching back. Also very convienient for the frequent tangents I take. I can just experiment and not worry about having to clean up my repo later.
git maintenance
can be used to simplify running a number of repository maintenance tasks, or even to automate them. These tasks are mostly to do with the internals of the git repo, but they can have some pretty significant impacts. git maintenance register
is all you need to run to get things running (see the docs or man git-maintenance
to see what the defaults are).
The tasks I think are most important for speeding up a local monorepo:
commit-graph
:commit-graph
files store metadata that help speed up any operation that needs to walk commits, which is a lot of them, and having them be incrementally and automatically updated keeps them a whole lot fresher than if they only get written whengit gc
is run, and in some workflows, possibly alsogit fetch
.- TL;DR: it makes a bunch of things faster
prefetch
: when this task is run,git
will fetch refs from all remotes, and store all updates underrefs/prefetch/
. This allows future fetches to fetch less, since the repo will already have more of the objects that would need to be fetched, while still avoiding disrupting or changing any references the user would expect to be stable.- TL;DR: it makes fetches faster
Other tasks:
gc
: This runs git's garbage collection to shrink repo sizes. Note that this can be disruptive and remove objects or refs that would be used for recovery (i.e. old reflog entries). Seeman git-gc
or the docs for more. This is not enabled by default.loose-objects
: This task packs some loose objects. Exactly what those are is a discussion for another update to this doc, but since packs are how git does compression, this can reduce repository size on disk. Note that the docs caution against enabling both this andgc
at the same time.- TL;DR: it makes repositories smaller
incrementa-repack
: this task incrementally repacks several small pack files. This can improve opportunities for the pack operation to make the most efficient deltas (git uses 'delta compression' in packs. also a separate topic).- TL;DR: it makes repositories smaller
pack-refs
: Git stores refs as files in a directory tree. This can be expensive to get a list of, sopack-refs
puts all the refs into a single file, which makes iteration across all of them faster.
This enables the filesystem monitor daemon in the current working directory when set to 'true'. This makes commands like status or other things that update the index faster. It's not available on every platform, so check man git-config
. You can supply a command to do the work of the fsmonitor in those cases.
In repos with a lot of files present on disk, this sets defaults to improve performance.