Last active
August 31, 2018 16:33
-
-
Save jeremy-w/f1bf9d41f92a3bb6ba9c396cd2f9f87b to your computer and use it in GitHub Desktop.
Git in a Nutshell - from Reuven Lerner's *Better Developers* newsletter, 2018-03-19 edition
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This week, let's take a break from Python and talk a little bit | |
about Git ( http://t.dripemail2.com/c/eyJhY2NvdW50X2lkIjoiNjE2ODIxOCIsImRlb= | |
Gl2ZXJ5X2lkIjoiMjMxMTgyMzY3OSIsInVybCI6Imh0dHA6Ly9naXQtc2NtLmNvbS8_X19zPXhz= | |
d2dxc3FiNHFuMml6d3c0aXloIn0 ). I teach Git courses every few | |
months, and without fail people come into the class saying that | |
they have been using Git for a few months, and it seems to work | |
OK so long as they use the list of commands that their boss | |
provided. But if something happens that isn't on that list, and | |
if they cannot figure out what to do based on Stack Overflow, | |
they're sunk. | |
The goal of my Git course is not only to help them use the | |
different Git commands, but also to give them insights into | |
what's happening inside of Git, so that when things go wrong -- | |
or appear to go wrong -- they can fix the problem, rather than | |
removing their local copy of the repository and cloning again. | |
Which is what a huge number of people do. | |
I want to point out that Git is one of the best tools I've ever | |
used, and has made me a better developer. And yet, I should also | |
point out that the user interface is exactly what you would | |
expect from a bunch of kernel hackers whose primary language is | |
C. The naming of Git features is terrible and inconsistent, the | |
number of options you can invoke is nearly infinite, and many of | |
the terms and commands were seemingly chosen because they clashed | |
with completely different commands used by other version-control | |
systems. | |
The thing is, once you understand how Git works, it suddenly | |
starts to make sense. And that's because Git doesn't do very | |
much at all: It's a specialized database, containing a very small | |
number of objects. And part of the genius of Git, in my opinion, | |
is that you can have a robust and fully operational | |
version-control system by implementing just a few ideas. | |
Indeed, you can think of Git as a database that contains just | |
four types of objects: | |
* blobs (i.e., file contents) | |
* trees (i.e., directories) | |
* tags | |
* commits | |
When you say "git commit", you're creating a new commit object. | |
That object points to a tree, and that tree then points to | |
additional trees and blobs. Assuming that your commit is not the | |
first in a repository, then it also points back to its parent. | |
Let's create and go through a Git repository to see what I'm | |
talking about. On the command line, I'll create a new directory | |
and repository: | |
$ mkdir gitfun | |
$ cd gitfun | |
$ git init | |
Git responds by saying: | |
Initialized empty Git repository in | |
/Users/reuven/Desktop/gitfun/.git/ | |
Great! We now have a new repository! | |
Um, but what does that mean? It means that Git has configured a | |
few things, including the special ".git" directory, under which | |
things are stored. What is stored there? Well, right now, | |
there's not much to see. Looking at ".git/objects", which is | |
where Git stores things, we'll see two subdirectories, but no | |
actual objects. | |
$ ls .git/objects | |
info/ pack/ | |
So, let's now create a new file in Git: | |
$ cat >> test1.txt | |
This is a test. | |
And a very good test it is! | |
$ git add test1.txt | |
$ git commit -m 'Added test1.txt' | |
[master (root-commit) 5816544] Added test1.txt | |
1 file changed, 2 insertions(+) | |
create mode 100644 test1.txt | |
In the above shell commands, I created a simple text file. Then | |
I staged it by using the "add" command -- what, you think that | |
there should be a "stage" command? But that would deprive | |
consultants of business opportunities! -- and then committed it | |
using "git commit". | |
The moment I did that, Git created a number of different objects. | |
Each object is represented in Git with a SHA-1 value. SHA-1 is a | |
hash function that doesn't guarantee that every file will have a | |
unique hash value, but it's close enough for all practical | |
purposes. If you had a way to deliberately create a file with a | |
given SHA-1, then Git would probably break -- but that's not | |
realistic, so far as I know, so we should be OK. | |
Git reported above that it created a new commit, and even gave us | |
the first few digits of its SHA-1, 5816544. We can see this more | |
clearly, and with a longer name, if we use "git log": | |
$ git log | |
commit 58165443eca522ef35bad68964fc09ec000449ef | |
Author: Reuven Lerner <[email protected]> | |
Date: Mon Jan 9 00:11:41 2017 +0200 | |
Added test1.txt | |
We can thus see that our most recent commit has a SHA-1 that | |
starts with 5816544, and continues until we get a 40-character | |
SHA-1. But we can use the first four hex digits, so longer as | |
they're unique in our repository. | |
Where did Git store this object? Inside of .git/objects. But | |
because our repository might contain lots of objects, we aren't | |
going to store everything straight inside of .git/objects. | |
Rather, Git takes the first two characters of the SHA-1, and | |
uses that as the name of a subdirectory in which to store | |
objects. For example: | |
$ ls .git/objects | |
37/ 58/ 79/ info/ pack/ | |
Our commit object is inside of the "58" directory: | |
$ ls .git/objects/58 | |
165443eca522ef35bad68964fc09ec000449ef | |
So as you can see, knowing the SHA-1 of an object allows Git to | |
find it right away in our filesystem. That's one of the reasons | |
why Git is so fast; the file's contents tell Git where a file is. | |
And when the file changes? Then Git will create a new object, | |
with a new SHA-1, reflecting the hash value of the new contents. | |
And thus, Git stores separate copies of each version of each | |
file that you might have written. | |
You might have noticed that Git created two other directories | |
above, "37" and "79". Why are those there? | |
Well, because Git didn't just create a commit object. It also | |
created a tree object that sits between the commit and one or | |
more trees and blobs. We can use the low-level Git command | |
cat-file, along with its "p" option, to inspect these files: | |
$ git cat-file -p 58165443eca522ef35bad68964fc09ec000449ef | |
tree 37675fc023b0863cd8a702041de28282caa17c1d | |
author Reuven Lerner <[email protected]> 1483913501 +0200 | |
committer Reuven Lerner <[email protected]> 1483913501 | |
+0200 | |
Added test1.txt | |
In other words, what are the contents of our commit object? it | |
contains a tree object (SHA-1 37675f), as well as information | |
about the author and committer (who are generally one and the | |
same), and then a comment. So the comment is actually part of | |
the commit object, which means that if you modify the comment on | |
a commit, you get a totally new commit object with new SHA-1. | |
Where is this tree object stored? Well, it has a SHA-1. And | |
look, its SHA-1 starts with 37! What if we look in that | |
directory? Can you guess what will be there? (I know, it's | |
obvious when I say it...) | |
$ ls .git/objects/37 | |
675fc023b0863cd8a702041de28282caa17c1d | |
And if we get the contents of our tree object, what do we find? | |
$ git cat-file -p 37675fc023b0863cd8a702041de28282caa17c1d | |
100644 blob 797f7c1809e83fd6122cb4a247d345e7f5de4f5d | |
test1.txt | |
See? Our tree object points to a blob. And if we look at the | |
blob: | |
$ git cat-file -p 797f7c1809e83fd6122cb4a247d345e7f5de4f5d | |
This is a test. | |
And a very good test it is! | |
Now, what happens when I modify test1.txt, and then commit it? | |
The answer: None of the existing objects are affected. They | |
stay precisely the way they were before. But if we create a new | |
commit, then it is our main, default commit (known as the HEAD), | |
and is the basis for any new commits we make. But the existing | |
commits remain around... well, basically forever. | |
For example: | |
$ cat >> test1.txt | |
Still a great file, right? | |
$ git add test1.txt | |
$ git commit -m 'Added amazing brilliance to our text file' | |
[master b6c4ec9] Added amazing brilliance to our text file | |
1 file changed, 1 insertion(+) | |
Notice that the SHA-1 returned by Git is different from the | |
previous one. If we look at it: | |
$ git cat-file -p b6c4ec9 | |
tree eeca41cd12f46cd4c237f28c78b7e11762a0b22b | |
parent 58165443eca522ef35bad68964fc09ec000449ef | |
author Reuven Lerner <[email protected]> 1483914372 +0200 | |
committer Reuven Lerner <[email protected]> 1483914372 | |
+0200 | |
Added amazing brilliance to our text file | |
Notice that our commit, since it isn't the first one in the | |
system (the "root" commit), has a "parent" field, pointing back | |
to the commit from which it came. But we still have a tree -- a | |
different tree object -- and the other standard stuff. Following | |
the tree along to the new file, we see: | |
$ git cat-file -p 909f2de7c8a572d91f06b188790416a2c195f0ed | |
This is a test. | |
And a very good test it is! | |
Still a great file, right? | |
But what if I'm nostalgic for the old version of the file? Is it | |
gone? Definitely not; Git holds onto it forever. I can even | |
look it | |
$ git cat-file -p 797f7c1809e83fd6122cb4a247d345e7f5de4f5d | |
This is a test. | |
And a very good test it is! | |
Now, cat-file isn't the sort of thing you use every day with Git. | |
But it does let you see that Git manages to do a lot with just a | |
few objects. | |
Next time, I'll talk about branches in Git, and how they're far | |
simpler than you might think. (Unless you already think that | |
they're simple!) And of course, if you have questions (about Git | |
or anything else!) that you would like me to address, please | |
respond to this message. I've been overwhelmed with suggestions | |
and ideas, so it'll take a while to get to all of them, but I | |
promise that I will. | |
Until next week, | |
Reuven | |
Sign up for newsletter at: https://lerner.co.il/newsletter/ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment