Created
February 22, 2009 17:32
-
-
Save offby1/68539 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# terminology | |
^^[[TOC]]^^ | |
### general VCS terminology | |
A **project** is the minimum set of source code (and related files) that need | |
to be kept together to **build** the software. Example: Linux | |
* each project will have one **repository** | |
* one **team** can work on multiple projects, so there could be multiple | |
repositories on each desktop | |
A **branch** in a project is an active line of development | |
* **master** is the conventional name for the main development tree of a | |
project | |
* other conventional branches are **next** (for code that is ready to come | |
into *main*), and various maintenance branches like **v1.3** or | |
__v2.6.4__ to designate released versions | |
* these are only conventions, not rules, but they seem to work well in | |
general | |
A **feature** is a part of a project that is large and complex enough that | |
it's day-to-day commits would be too noisy to include in the main project. | |
Example: the disk subsystem, the networking subsystem, etc., in Linux | |
* a **feature branch** is a branch for a feature, and is usually long-lived. | |
This means it regularly acquires changes made in the main line, and -- at | |
stable points in its development cycle -- merges its changes back into the | |
main line | |
* small projects may not have any feature branches. That doesn't mean they | |
don't have any features :-) | |
### git-specific terminology | |
#### branch, master, head, HEAD, etc | |
* a branch is "an active line of development" | |
* *master*: default branch in a project (main development tree), by | |
convention | |
* *head*: tip of a branch | |
* a repository can track many branches, but the working tree is associated | |
with only one branch at a time | |
* *HEAD*: tip of the branch associated with the working tree; this is where | |
commits go. Normally. There is also something called a 'detached HEAD' | |
that you should be aware of. See the section on the [detached head and | |
the malloc analogy](#dhma) below for more. | |
* *index*: a staging area for the next commit; when you commit, the current | |
index is turned into a real commit object | |
* *origin*: when you clone an existing project in order to start working on | |
it, the project you cloned *from* is traditionally called the "origin" of | |
your clone | |
#### what is a "bare" repository? | |
A bare repository is a concept that is sort of unique to a Distributed VCS | |
like git (and, I presume, other such DVCSs like Hg/Bzr/etc also). | |
A normal git repository is a directory that contains | |
* project directories and files (the "working tree" mentioned above) | |
* a single directory called `.git` containing all of git's | |
administrative and control files; we'll call it the **magic** directory | |
because git can't do any magic without it :-) | |
When you do a `git status` inside such a directory, git looks inside the | |
"magic" directory, compares your current working tree with the "current | |
branch" as recorded in the magic directory, and tells you what files have | |
changed, etc etc. | |
A "bare" repo, as the git | |
[glossary](http://www.kernel.org/pub/software/scm/git/docs/gitglossary.html) | |
says, is a repository that does not contain a "working tree" at all. It | |
doesn't contain the special `.git` sub-directory either; instead, it | |
contains all the contents of the `.git` subdirectory right in the main | |
directory itself. | |
##### yeah yeah, but **why** do I need a bare repo? | |
ok; demo time... | |
Let's try creating a small repo, adding a file, committing, and checking | |
`git status`: | |
mkdir a; cd a; git init | |
echo hi > a;git add a; git commit -m a | |
git status | |
This should respond | |
# On branch master | |
# nothing to commit (working directory clean) | |
So far so good. Now someone clones our repository, adds a new file, commits, | |
and pushes his changes back to our repository: | |
cd ..;git clone a b | |
cd b; echo there >> b; git add b; git commit -m b | |
git push | |
The `git push` above sends your new commits to the "origin" repository. More | |
specifically, it updates the "magic" directory on repo "a" with this new | |
commit. | |
Now you go back to the main repo and check `git status` | |
cd ../a | |
git status | |
which responds | |
# On branch master | |
# Changes to be committed: | |
# (use "git reset HEAD <file>..." to unstage) | |
# | |
# deleted: b | |
Whoa! What happened here? We **added** a file called `b` in the cloned | |
repository and pushed to the "origin". But your origin now claims you | |
**deleted** that file...? | |
To understand this, you need to realise that **the "magic" directory is always | |
assumed to be correct**; it is the "standard" against which your working tree | |
is compared to determine what changes you made in your working tree. | |
So when you asked for a status, git first looked inside the magic directory. | |
The magic directory said you should have two files, "a" and "b", but your work | |
tree has only file "a". So `git status` concludes that you have deleted | |
the file "b"! | |
In other words, when someone changes the "magic" directory **behind your | |
back**, your locally checked out copy (your working tree) appears to have the | |
**opposite** changes made by you. | |
All this confusion can (and *should*) be avoided by using a "bare" | |
repository to clone and pull from, and push to. It doesn't have a checked out | |
tree, so it just does what the "server" notionally does in a centralised VCS | |
-- records commits, branches, etc when you push to it, and gives you the | |
latest versions when you clone or pull from it. | |
#### #dhma detached head and the malloc analogy | |
If you're a developer looking to understand branches, the commit tree, | |
detached head, reflog, etc., this section might help you. I assume you | |
understand the basics of the standard malloc(3) system call: you make a request to allocate some | |
memory, and malloc returns you a pointer to the memory so you can do | |
what you want with it in your program. | |
To understand what a detached head is, you have to think of a **commit tree** | |
as a one-directional linked list. Each **branch** (like 'master', 'next', | |
etc) is a global/static pointer variable, pointing to the top of one such | |
list. A **commit** is one node in the linked list; a struct containing some | |
information plus links to parent commit struct(s). There are no forward links | |
-- you cannot go from an earlier commit to a later one. | |
Making a new commit involves the classic "insert a new node at the head of the | |
linked list" code that anyone who's done a basic data structures course knows. | |
In pseudo-code: | |
HEAD = malloc(...); // ask for some memory for the struct | |
HEAD.body = ...; // fill in the commit info | |
HEAD.parent = master; // fill in the backlink | |
master = HEAD; // move list head to new node | |
A **reset** of a branch is like assigning to the corresponding global | |
variable: | |
git checkout master # HEAD = master | |
git reset --hard HEAD~ # master = HEAD = HEAD.parent | |
The old value of HEAD is now gone; nothing is pointing to that node now, and | |
it will be garbage collected soon. [Well, not immediately; see reflog | |
below...] | |
A **checkout** is different from a reset. In a checkout, you're only changing | |
HEAD, not the global variable representing your branch: | |
git checkout master # HEAD = master | |
git checkout HEAD~ # HEAD = HEAD.parent // master is unchanged | |
And this is what is called a **detached head**. The HEAD variable is pointing | |
to something that *none* of your global variables directly points to. So | |
git checkout master | |
git checkout HEAD~3 | |
# together, these two are equivalent to 'git checkout master~3' | |
which checks out the files represented by the 3rd previous commit to the | |
current one, is like saying | |
HEAD = master; | |
HEAD = HEAD.parent; | |
HEAD = HEAD.parent; | |
HEAD = HEAD.parent; // 3 times | |
**It is important to realise** that 'HEAD' is not a global variable, and | |
gets lost when you do something else (like another checkout of a different | |
branch perhaps). | |
I think I'd have said that HEAD is indeed a global variable, but | |
its value gets changed automatically, as opposed to branches, which | |
are also global variables, but whose values generally only change when | |
you explicitly change them. (Except for the one that HEAD is | |
currently pointing to! (OK, that's why you're writing this, and not | |
me.)) | |
So let's say you now did | |
git checkout otherbranch | |
This is like | |
HEAD = otherbranch | |
...and the old 'HEAD' is lost. | |
Now this wouldn't be so bad if it was merely the result of a previous `git | |
checkout master~3`. But what if you made a commit on a detached head? | |
git checkout master~3 | |
git add blah; git commit -m 'detached commit' | |
This is like | |
HEAD = master.parent.parent.parent // as before | |
temp = malloc(...) | |
temp.body = ... | |
temp.parent = HEAD | |
HEAD = temp | |
At this point, HEAD represents something that is truly not recoverable if you | |
overwrite it. And it *will* get overwritten if you checkout a different | |
branch: | |
git checkout otherbranch | |
# HEAD = otherbranch | |
Of course, you can always make a detached head *permanent* before it gets | |
overwritten: | |
git branch newbranch # makes a branch out of the current HEAD | |
which is like | |
static struct ... *newbranch | |
// yeah I know C doesn't let you create a new global variable | |
// dynamically, but let's pretend... | |
newbranch = HEAD | |
And now you can afford to lose your HEAD without any ill effects, because | |
you've saved the value of that node in 'newbranch'. | |
##### one last gasp -- the reflog | |
With all this background, the reflog is easy. Pretend your malloc has a | |
wrapper around it that saves away the return values each time it is called, as | |
well as what command called it and when, and it keeps this information for 30 | |
days. Even if you lost all your pointers, you could check this saved list and | |
the caller/time information to jog your memory of which one it was, and | |
actually use the pointer value to assign to a global variable. | |
git reflog show HEAD@{now} -10 | |
dcd215b... HEAD@{5 minutes ago}: commit (amend): 0-terminology: the malloc analogy added, plus | |
5ce8bfe... HEAD@{11 minutes ago}: commit: 0-terminology: the malloc analogy added, plus | |
3d93420... HEAD@{11 minutes ago}: rebase -i (pick): updating HEAD | |
7fdae94... HEAD@{11 minutes ago}: checkout: moving from master to 7fdae94815d6c676742c9984132b7b9e71a57f98 | |
3d93420... HEAD@{13 minutes ago}: rebase -i (squash): updating HEAD | |
c55900c... HEAD@{13 minutes ago}: rebase -i (pick): updating HEAD | |
7fdae94... HEAD@{13 minutes ago}: checkout: moving from master to 7fdae94815d6c676742c9984132b7b9e71a57f98 | |
e9955c8... HEAD@{14 minutes ago}: commit: s | |
97ab644... HEAD@{20 minutes ago}: commit: autogen | |
c55900c... HEAD@{23 minutes ago}: commit (amend): 0-terminology: the malloc analogy added, plus | |
Now you look at this, decide which one you want, and grab it: | |
git branch thank_God_its_safe 7fdae94 | |
# like 'thank_God_its_safe = 0x7fdae94815d6c676742c9984132b7b9e71a57f98' | |
So, the final part of our analogy, if you haven't figured it out yet, is that | |
the **SHA1** is like the pointer value returned by malloc :-) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment