Skip to content

Instantly share code, notes, and snippets.

@cube-drone
Created January 11, 2013 01:26
Show Gist options
  • Select an option

  • Save cube-drone/4507220 to your computer and use it in GitHub Desktop.

Select an option

Save cube-drone/4507220 to your computer and use it in GitHub Desktop.
A cube drone 'bout source control.

Source Control.

Okay, so, you're a software developer - I'm going to assume that you use source code control software, because the last programmer stubborn enough not to use source control died in 2008.

However, in the off-chance that you are too young or too old to know the gospel, I'm going to give a very brief overview of what you somehow managed to miss.

A project composed of code has three problems that source code is designed to solve.

  • History,
  • Collaboration,
  • Branching,

History

A complex edit to your codebase is going to involve small or large changes to several different files at the same time.

Let's imagine, though, that your changes don't work. You've changed several files, and now you need to change them back to the way that they were before.

Or, the change did work, but there was something important in the code that you deleted and now you need to look at it again.

You could frantically jam on the undo command until your code is the way it was when it last worked. The obvious problem with this is that you have to remember exactly how the code looked when it worked last, and you have to keep all of your code open in the same editor for as long as you're editing it, and you have to trust that your editor has a long, long, LONG undo history.

Or, you could selectively comment out patches of code that you think might be useful again in the future. This is called the "Lava Flow" anti-pattern - so named because you develop a thin stream of live code surrounded by layers and layers of cold, old, inoperable code. It's messy; it's incomplete, and those commented-out blocks reproduce if you leave them alone too long. Soon you have more commented-out code than real code.

So: Source Code Control allows you to set checkpoints whenever you have the code in a working state. And you can go back to the checkpoints any time that you like.

Collaboration

Once you introduce a second person into your code modification process, you've introduced a raft of potential difficulties. How do you keep updates synced up between computers?

If you happen to be able to work with a persistent internet connection, you could have a few people working on the same network share, implementing a complex system of semaphores to keep multiple people from working on the same file at the same time.

Or you could just make diffs of all of your changes and e-mail them to a central code manager.

Okay, at this point you might be getting wise to my rhetorical strategy of sarcastically presenting awful solutions to complex problems. These are awful solutions.

Source Code Control always comes with some kind of usable collaboration strategy. We'll talk more about that later.

Branching

Finally, it's possible that you might want to maintain different states of the codebase. So, for example, you might want to maintain a 'deployed' version of your code, as well as a 'development' version.

Most modern source control solutions support branching - which allows you to maintain separate states of your codebase in parallel.

Backup

I want to make this clear: Version Control software is NOT backup software. A software repository is not a backup. It's something TO back up. However, if your team has all of it's code in one place, you at least have something concrete to backup.

Okay, so, History, Collaboration, and Branching. Even if you're just working on your own code, if what you're working on is more complicated than a tiny script, you're probably going to want source control - the history features alone make it worthwhile.

Source Control Options

I'm going to focus mostly on Subversion, Perforce, and Git in this presentation, because they cover a lot of ground in terms of different types of features, and because they're the options that I'm the most familiar with.

Of these, Perforce is proprietary software, and Subversion and Git are free.

Lock & Edit vs. Edit & Merge

Imagining that you have two people, Mark Peters and Captain Magic Elbows - each working on the same repository, and each wanting to edit the same file.

There are two ways that source control traditionally resolves this issue - Lock & Edit, and Edit & Merge.

Lock & Edit - Captain Magic Elbows locks the files that he wants to edit. Nobody else can edit the file until Captain Magic Elbows finishes with it. Perforce works this way.

Edit & Merge - Captain Magic Elbows and Mark Peters both edit the same file at the same time. Captain Magic Elbows completes his change. Before Mark Peters can complete HIS change, he has to merge his change with Captain Magic Elbows' change. Usually, this merge is handled automatically by the software. Subversion and Git both work this way.

Centralized vs. Distributed

Perforce and Subversion are both Centralized repositories. One canonical server contains the entire history of your codebase. You check-out a set of files from the repo, work on it for a while, and save your changes back to the repository.

Git is a Distributed repository. You'll probably still have a central server with a master repository - for the sake of clear collaboration - but each individual developer also has their own entire repository on their machine. They check-out changes from their personal repository, work on them, save them back to their personal repository, and THEN, sync their personal repository with the master repository.

Both schemes have their ups and downs. Centralized repositories are decidedly less complicated - and in the case of very large repositories or very long histories, it can be useful for developers only to have to deal with a small subset at any given time.

However, this means that all repository operations have to go through the network to the central server. If you want to see the differences between your file and that same file, a version ago, that's a network request. Every time you make a major change to the codebase, that's another network request.

I once worked at a big tech company - I don't want to be too specific, but they made a cel-phone named after a fruit. Not the fruit you're thinking of - and the entire company did all of it's development on one... enormous ... Perforce repository. Every product. Every version. Every peripheral binary, all loaded into the same giant repository. And even though that one repo was hosted on a pretty mighty computer, around 9:00 in the morning and 5:00 in the evening, it started to get really, really slow.

With Distributed repositories, operations against the repo are local. And fast. Distributed repositories are extra-great if you happen to be working in an area with network difficulties. Like a bus. Or two buses.

Revision Numbers

One important facet of version control systems is how they identify files over time.

Let's look at a repository containing 5 files. As we change these files, how do we keep track of the different versions of the file over time.

Subversion, Perforce, and Git, all handle this in different ways.

In Perforce, each individual file has its own revision number. Every time you change a group of files, each file's revision number increments - but files that aren't touched don't change.

In Subversion, the entire repository gets a revision number. Every time you change a group of files, the revision number of the entire repository is incremented.

In Git, thanks to the fact that it's distributed, using an incrementing number for versions would be a difficult problem. If Mark Peters and Captain Magic Elbows both make a change to a repository, and later, these changes are merged together, who is to decide which one of these changes came first? Instead, Git gives every change a unique hash value.

Traditionally, the most recent change is known as the HEAD revision - although in Perforce, every file has its own HEAD revision.

In Git, the change preceding the HEAD revision is known as HEAD^. The change preceding that is known as HEAD^^, and so on. This is a convenient shortcut for people who don't want to have to deal with long hashes every time they deal with revisions - especially considering how, 90% of the time, HEAD and HEAD^ are the only revisions you're interested in.

Branches

I don't want to go too deeply into branches except to say that before using Git, I rarely used branches at all - and now, I use them quite a bit more often.

A local repository makes branch operations MUCH faster. That speed makes a lot of difference.

Other VCS's

I've talked a lot about Perforce, Git, and Subversion, but I might as well quickly mention some of the others that you might encounter - keeping in mind that I probably know less about them than you do.

  • You're unlikely to ever encounter SCCS (1972-1982), the first version control system, or RCS (1982-1986), its successor, but you might encounter CVS (1986-2000), SubVersion's predecessor, and one of the most popular source control systems for almost 20 years.

  • SourceSafe, a defunct Microsoft technology with a reputation for data corruption and broken hearts.

  • Team Foundation Server, a Microsoft technology, successor to the dubious legacy of Microsoft SourceSafe - it's a version control system with a hefty dose of bug tracker, build server, and project manager packed in to one integrated solution.

  • Mercurial, (or Hg), a lot like git, but considered by many to be easier to use, and written in python, which is a nice bonus.

  • Bazaar, which is Windows-native, focuses on a clean UI and easily-understood merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment