# Advanced Git: edPUG, July 2014
rm -rf example-repo
export GIT_COMMITTER_DATE="2014-07-15 19:01:00"
saystd "Advanced git"
Good evening.
clear; fanduel.sh | lolcat
I’m John. I'm the Lead Python Developer at FanDuel, and I work on our data projects. We're always hiring.
saystd "Why this talk?"
I used to work with Ally a few years ago, and when I caught up with him recently he mentioned that he'd started using git.
thinkstd "Increase confidence\!"
In my experience people find git baffling, but with a little more knowledge about how it works, and why, we can get the same level of confidence in git as we have in well tested code, or repeatable deploys.
thinkstd "Sausage meat?"
Lets create a new repository, and see what's there.
clear
git init example-repo; cd example-repo
tree -aC
cat .git/HEAD
cat .git/config
Nothing very exciting, git doesn't really know anything yet.
saystd blobs
echo "Edinburgh PHP Users Group" > edpug.txt
git add edpug.txt
tree -aC
Two new files have appeared, index
, and a wile file!
saystd "The index"
You may already know what the index is, but just in case:
git status
git ls-files
git rm --cached edpug.txt
tree -aC
git status
git add edpug.txt
saystd "Wild file"
thinkstd "Wildstyle"
The wild file! What's in it?
cat .git/objects/44/b292ec9a921916948bae0bf94d62ac3d6d18f5
Gibberish. Perhaps it's compressed with zlib.
python ../defalte.py < .git/objects/44/b292ec9a921916948bae0bf94d62ac3d6d18f5
OK cool, something that looks useful (and I'm not talking about Python)!
So this file 44/b292 etc. contains a blob, and 26 is the length of the string:
echo "Edinburgh PHP Users Group" | wc -c
And the filename 44/b292 comes from a hash of the contents:
echo "Edinburgh PHP Users Group" | git hash-object --stdin
So when we ran git add
, we added it's contents to git, and the filename to
the index.
Lets commit the file, and see what happens.
git commit -m "My commit messages should be better." --date "2014-07-15 19:00:00"
tree -aC
Don't worry about all that date shit, that's me messing around with git so that I can make sure that all of my commands work later.
A few new things.
-
A COMMIT_EDITMSG.
cat .git/COMMIT_EDITMSG
-
A couple of master files;
-
A couple more objects;
clear; python ../defalte.py < .git/objects/a0/104ee27ce5d3b54506677ce59b2745a8a32e36
saystd "trees"
This first one is a tree.
Remember that the blob from before was only the file contents, well this maps files to blobs.
In a regular file-system, the filenames are located in a File Allocation Table, and then a pointer to the area on disk where the contents are stored. This is exactly the same.
One tree can point to another tree, which is how directories work.
clear; python ../defalte.py < .git/objects/13/780d94c5a824dcebce0432bf6602e9b714b481
saystd "Commits"
The other file is our commit. Which points to a single tree, the root.
This is all quite a clever setup, which we don't have time to get into, but it's related to persistent data structures, which we can discuss later if you're interested.
We've seen that file contents are stored in blobs, that the directory, and file structure is stored in trees, and the commits are commits. But fundamentally, everything is an object.
saystd "Branching & merging"
Lets create a new branch.
git checkout -b more-work
tree -aC
cat .git/refs/heads/more-work
A branch is just a file with a commit's hash in it.
Lets do some work.
echo "More work" > more.txt
git add .
git commit -m "Doing some more work" --date "2014-07-15 19:01:00"
Lets see what this commit is made of.
git show fe7cd41a86230246380aa08c130a1c7b0a1f1b14 --format=raw
Cool, we can see the tree, a parent commit, some details about me, the commit message, and the diff.
q
This is a graph-view of the git tree as it exists now.
git log --graph --decorate --color --all --oneline
We can see out two commits, and their messages, as well as where our branches are.
q
OK, lets get into trouble by deleting this new branch.
git checkout master
This is how the repo looks before we delete the branch.
tree -aC
git branch -D more-work
And after we delete the branch.
tree -aC
The only difference is the missing more-work reference.
git log --graph --decorate --color --all --oneline
We can see now that that commit is no longer in the log. Probably a shit-your pants sort of moment if that's your week's work.
q
thinkstd "I'm lost"
git reflog
q
OK, I see a line there with a commit-hash that isn't master, lets see if we can use that.
git checkout -b more-work2 159fc80fbf2867b4196061ffed5f99c8c84ef9af
git log --graph --decorate --color --all --oneline
So cool. We have our week's work back!
You may have noticed that I have branch names, and short commit hashes in my prompt. That so I can use my scroll back as a poor-man's reflog.
thinkstd "Merging"
We saw before that commits have a parent commit. A merge is just a single commit with multiple parent commits.
git merge --no-ff more-work
:wq
git show HEAD --format=raw
q
git log --graph --decorate --color --all --oneline
q
thinkstd "A bad merge"
Now lets suppose we've merged something too soon, or there were conflicts, which we resolved poorly.
Lets undo the merge.
saystd "reset"
git reset --hard HEAD^
gl
q
Cool, we're back to where we were.
Fast forward merges can be performed when the branch you're merging into hasn't changed. I like to have explicit merge commits, so I never fast-forward.
saystd "fast-forward"
git merge --no-ff more-work
:wq
When we undid the merge, we used reset to move the master back, and we used the
magic reference HEAD^
. The ^
means "the parent of", so the parent of
HEAD
. In this case we have two parents, we can use HEAD^2
, which is the
other parent.
Most resets will be either --hard
, or --soft
, Hard resets the working copy,
as well as the branch. Soft only resets the branch.
git reset --soft HEAD^
git reset HEAD .
git merge --no-ff more-work
:wq
saystd "rebase"
Rebasing creates a new set of commits which have a different parent.
git checkout more-work
git checkout -b rebasing
echo "More work somewhere else" > more.txt
git commit -am "Does more work"
gl
q
Lets rebase.
git rebase master
We can see now that our rebasing
branch is "on-top-of" master
, but that
it's commit message is the same as more-work
, and they have different
commit-ids.
gl
q
saystd "Interactive rebase"
Interactive rebasing allows you to rewrite commits as you go, I'll not go into it, other than to mention it.
saystd "git rebase -i ..."
saystd "Wrap-up"
- File contents are stored in blobs;
- The file-system is stored as trees;
- Commits are meta-data and pointers to a tree's "root" node;
- Commits don't get deleted, we just forget where they are;
- Branches are just pointers to commits;
- We can find lost commits using the reflog;
- We can use reset to move branch pointers around, and get out of trouble;
- We can rewrite commits with rebase;
saystd "BONUS FEATURES\!"
Switch to the previous branch with:
saysmall "git checkout -"
You can adjust the last commit with:
saysmall "git commit --amend"
Add all changed, tracked files with:
saysmall "git add -u"
Fetch remote refs, removing (prune) dead branches:
saysmall "git fetch -p"
Stash all changes, including untracked files:
saysmall "git stash -u"
- Tom Preston-Werner - The Git Parable
- Matthew McCullough - Advanced Git: Graphs, Hashes, and Compression, Oh My!
- PeepCode - Advanced Git
## Colophon
- vim
- vim-slime
- zsh
- tmux
- figlet
- cowsay
- lolcat
- dotfiles