Skip to content

Instantly share code, notes, and snippets.

@offby1
Created February 22, 2009 17:32
Show Gist options
  • Save offby1/68539 to your computer and use it in GitHub Desktop.
Save offby1/68539 to your computer and use it in GitHub Desktop.
# terminology
^^[[TOC]]^^
### general VCS terminology
A **project** is the minimum set of source code (and related files) that need
to be kept together to **build** the software. Example: Linux
* each project will have one **repository**
* one **team** can work on multiple projects, so there could be multiple
repositories on each desktop
A **branch** in a project is an active line of development
* **master** is the conventional name for the main development tree of a
project
* other conventional branches are **next** (for code that is ready to come
into *main*), and various maintenance branches like **v1.3** or
__v2.6.4__ to designate released versions
* these are only conventions, not rules, but they seem to work well in
general
A **feature** is a part of a project that is large and complex enough that
it's day-to-day commits would be too noisy to include in the main project.
Example: the disk subsystem, the networking subsystem, etc., in Linux
* a **feature branch** is a branch for a feature, and is usually long-lived.
This means it regularly acquires changes made in the main line, and -- at
stable points in its development cycle -- merges its changes back into the
main line
* small projects may not have any feature branches. That doesn't mean they
don't have any features :-)
### git-specific terminology
#### branch, master, head, HEAD, etc
* a branch is "an active line of development"
* *master*: default branch in a project (main development tree), by
convention
* *head*: tip of a branch
* a repository can track many branches, but the working tree is associated
with only one branch at a time
* *HEAD*: tip of the branch associated with the working tree; this is where
commits go. Normally. There is also something called a 'detached HEAD'
that you should be aware of. See the section on the [detached head and
the malloc analogy](#dhma) below for more.
* *index*: a staging area for the next commit; when you commit, the current
index is turned into a real commit object
* *origin*: when you clone an existing project in order to start working on
it, the project you cloned *from* is traditionally called the "origin" of
your clone
#### what is a "bare" repository?
A bare repository is a concept that is sort of unique to a Distributed VCS
like git (and, I presume, other such DVCSs like Hg/Bzr/etc also).
A normal git repository is a directory that contains
* project directories and files (the "working tree" mentioned above)
* a single directory called `.git` containing all of git's
administrative and control files; we'll call it the **magic** directory
because git can't do any magic without it :-)
When you do a `git status` inside such a directory, git looks inside the
"magic" directory, compares your current working tree with the "current
branch" as recorded in the magic directory, and tells you what files have
changed, etc etc.
A "bare" repo, as the git
[glossary](http://www.kernel.org/pub/software/scm/git/docs/gitglossary.html)
says, is a repository that does not contain a "working tree" at all. It
doesn't contain the special `.git` sub-directory either; instead, it
contains all the contents of the `.git` subdirectory right in the main
directory itself.
##### yeah yeah, but **why** do I need a bare repo?
ok; demo time...
Let's try creating a small repo, adding a file, committing, and checking
`git status`:
mkdir a; cd a; git init
echo hi > a;git add a; git commit -m a
git status
This should respond
# On branch master
# nothing to commit (working directory clean)
So far so good. Now someone clones our repository, adds a new file, commits,
and pushes his changes back to our repository:
cd ..;git clone a b
cd b; echo there >> b; git add b; git commit -m b
git push
The `git push` above sends your new commits to the "origin" repository. More
specifically, it updates the "magic" directory on repo "a" with this new
commit.
Now you go back to the main repo and check `git status`
cd ../a
git status
which responds
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# deleted: b
Whoa! What happened here? We **added** a file called `b` in the cloned
repository and pushed to the "origin". But your origin now claims you
**deleted** that file...?
To understand this, you need to realise that **the "magic" directory is always
assumed to be correct**; it is the "standard" against which your working tree
is compared to determine what changes you made in your working tree.
So when you asked for a status, git first looked inside the magic directory.
The magic directory said you should have two files, "a" and "b", but your work
tree has only file "a". So `git status` concludes that you have deleted
the file "b"!
In other words, when someone changes the "magic" directory **behind your
back**, your locally checked out copy (your working tree) appears to have the
**opposite** changes made by you.
All this confusion can (and *should*) be avoided by using a "bare"
repository to clone and pull from, and push to. It doesn't have a checked out
tree, so it just does what the "server" notionally does in a centralised VCS
-- records commits, branches, etc when you push to it, and gives you the
latest versions when you clone or pull from it.
#### #dhma detached head and the malloc analogy
If you're a developer looking to understand branches, the commit tree,
detached head, reflog, etc., this section might help you. I assume you
understand the basics of the standard malloc(3) system call: you make a request to allocate some
memory, and malloc returns you a pointer to the memory so you can do
what you want with it in your program.
To understand what a detached head is, you have to think of a **commit tree**
as a one-directional linked list. Each **branch** (like 'master', 'next',
etc) is a global/static pointer variable, pointing to the top of one such
list. A **commit** is one node in the linked list; a struct containing some
information plus links to parent commit struct(s). There are no forward links
-- you cannot go from an earlier commit to a later one.
Making a new commit involves the classic "insert a new node at the head of the
linked list" code that anyone who's done a basic data structures course knows.
In pseudo-code:
HEAD = malloc(...); // ask for some memory for the struct
HEAD.body = ...; // fill in the commit info
HEAD.parent = master; // fill in the backlink
master = HEAD; // move list head to new node
A **reset** of a branch is like assigning to the corresponding global
variable:
git checkout master # HEAD = master
git reset --hard HEAD~ # master = HEAD = HEAD.parent
The old value of HEAD is now gone; nothing is pointing to that node now, and
it will be garbage collected soon. [Well, not immediately; see reflog
below...]
A **checkout** is different from a reset. In a checkout, you're only changing
HEAD, not the global variable representing your branch:
git checkout master # HEAD = master
git checkout HEAD~ # HEAD = HEAD.parent // master is unchanged
And this is what is called a **detached head**. The HEAD variable is pointing
to something that *none* of your global variables directly points to. So
git checkout master
git checkout HEAD~3
# together, these two are equivalent to 'git checkout master~3'
which checks out the files represented by the 3rd previous commit to the
current one, is like saying
HEAD = master;
HEAD = HEAD.parent;
HEAD = HEAD.parent;
HEAD = HEAD.parent; // 3 times
**It is important to realise** that 'HEAD' is not a global variable, and
gets lost when you do something else (like another checkout of a different
branch perhaps).
I think I'd have said that HEAD is indeed a global variable, but
its value gets changed automatically, as opposed to branches, which
are also global variables, but whose values generally only change when
you explicitly change them. (Except for the one that HEAD is
currently pointing to! (OK, that's why you're writing this, and not
me.))
So let's say you now did
git checkout otherbranch
This is like
HEAD = otherbranch
...and the old 'HEAD' is lost.
Now this wouldn't be so bad if it was merely the result of a previous `git
checkout master~3`. But what if you made a commit on a detached head?
git checkout master~3
git add blah; git commit -m 'detached commit'
This is like
HEAD = master.parent.parent.parent // as before
temp = malloc(...)
temp.body = ...
temp.parent = HEAD
HEAD = temp
At this point, HEAD represents something that is truly not recoverable if you
overwrite it. And it *will* get overwritten if you checkout a different
branch:
git checkout otherbranch
# HEAD = otherbranch
Of course, you can always make a detached head *permanent* before it gets
overwritten:
git branch newbranch # makes a branch out of the current HEAD
which is like
static struct ... *newbranch
// yeah I know C doesn't let you create a new global variable
// dynamically, but let's pretend...
newbranch = HEAD
And now you can afford to lose your HEAD without any ill effects, because
you've saved the value of that node in 'newbranch'.
##### one last gasp -- the reflog
With all this background, the reflog is easy. Pretend your malloc has a
wrapper around it that saves away the return values each time it is called, as
well as what command called it and when, and it keeps this information for 30
days. Even if you lost all your pointers, you could check this saved list and
the caller/time information to jog your memory of which one it was, and
actually use the pointer value to assign to a global variable.
git reflog show HEAD@{now} -10
dcd215b... HEAD@{5 minutes ago}: commit (amend): 0-terminology: the malloc analogy added, plus
5ce8bfe... HEAD@{11 minutes ago}: commit: 0-terminology: the malloc analogy added, plus
3d93420... HEAD@{11 minutes ago}: rebase -i (pick): updating HEAD
7fdae94... HEAD@{11 minutes ago}: checkout: moving from master to 7fdae94815d6c676742c9984132b7b9e71a57f98
3d93420... HEAD@{13 minutes ago}: rebase -i (squash): updating HEAD
c55900c... HEAD@{13 minutes ago}: rebase -i (pick): updating HEAD
7fdae94... HEAD@{13 minutes ago}: checkout: moving from master to 7fdae94815d6c676742c9984132b7b9e71a57f98
e9955c8... HEAD@{14 minutes ago}: commit: s
97ab644... HEAD@{20 minutes ago}: commit: autogen
c55900c... HEAD@{23 minutes ago}: commit (amend): 0-terminology: the malloc analogy added, plus
Now you look at this, decide which one you want, and grab it:
git branch thank_God_its_safe 7fdae94
# like 'thank_God_its_safe = 0x7fdae94815d6c676742c9984132b7b9e71a57f98'
So, the final part of our analogy, if you haven't figured it out yet, is that
the **SHA1** is like the pointer value returned by malloc :-)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment