Skip to content

Instantly share code, notes, and snippets.

@kvnsmth
Last active March 5, 2023 21:58
Show Gist options
  • Save kvnsmth/4688345 to your computer and use it in GitHub Desktop.
Save kvnsmth/4688345 to your computer and use it in GitHub Desktop.
A real world usage for git subtrees.

Let's say you have an iOS project, and you want to use some external library, like AFNetworking. How do you integrate it?

With submodules

Add the project to your repo:

git submodule add [email protected]:AFNetworking/AFNetworking.git Vendor/AFNetworking

or something to that effect.

Well, what happens if you find a bug in AFNetworking that you want to fix? With submodules, I'd usually

  1. Fork AFNetworking
  2. Go through the pain of changing my project's submodules to point to my new fork
  3. Make my change and commit it to my fork
  4. Submit a pull request to AFNetworking repo
  5. Wait to see if the pull request is accepted, but keep my fork up-to-date in the meantime
  6. If it is accepted, do the whole dance to switch my submodule back to the official AFNetworking repo
  7. Continue as usual

Okay, that sucks. If you've ever done it, you know how painful it is and how finicky submodules can be.

With subtrees

Add the project to your repo:

git subtree add --prefix=Vendor/AFNetworking --squash [email protected]:AFNetworking/AFNetworking.git master

This is pretty similar so far except the other members of your team won't have to remember to run git submodule update because subtrees actually store the source in your repo. Nice.

Now, let's say we have the same bug. What do I do differently now that I'm using subtrees?

I make my change and commit it to my project's repository. Technically, I could stop now if I wanted to since the bug fixed code is in my repository. But, I want to be a good open source citizen, so what do I do?

Be a good open source citizen

  1. I'll fork AFNetworking into my account on Github.
  2. Back in my local repo:
git subtree split --prefix=Vendor/AFNetworking/ --branch AFNetworking

to set up being able to push changes to my fork. 3. I'll push my change to my fork, but on a branch to make the pull request more awesome.

git push [email protected]:kvnsmth/AFNetworking.git AFNetworking:critical-bug-fix
  1. I would issue a pull request and hope it gets accepted, but a big difference is that the acceptance of my change doesn't keep me from being able to easily stay in sync with the official AFNetworking repo.

I can still do:

git subtree pull --prefix=Vendor/AFNetworking --squash [email protected]:AFNetworking/AFNetworking.git master

to stay up-to-date with the latest in the official repository.

Now, I think that is much better than using submodules and a lot less invasive to my repo.

@florianbuerger
Copy link

The same workflow you described for subtree is valid for submodules as well. You can simply add the original remote to your fork to pull in the changes, no need to switch the remotes every time.

@diederich
Copy link

Thanks for the reminder about git subtrees. Here are 2 more thoughts regarding this issue:

  • Code-Sharing
    With subtrees, the patched external code lives only the app's repository.
    If you want to share the external library with more than one project, it might be easier to have an dedicated repository to pull from.
  • With subtree's it's easier to not be a good citizen :-)
    While it can certainly be done with subtree, IMHO the barrier of pushing the changed code somewhere upstream is higher.
    On the other hand, once you have the submodule setup, you actually have to push the changes somewhere and sending a pull-request is only one click away (thanks github :-)).
  • I feel the pain regarding the switch of a submodule's upstream URL. git submodule sync can be easily forgotten and you get a somewhat obscure error (ref not found -> "Did he forget to push again" :-)).
    But this happens only once, as @florianbuerger says, it shouldn't be necessary to switch back to the upstream repository.

@kvnsmth
Copy link
Author

kvnsmth commented Feb 1, 2013

@florianbuerge @diederich Thanks for the comments, guys! Valid points. I'm still investigating subtrees, but I really like what I'm seeing so far. I've always had weird issues with git submodule sync, but I could just be doing it wrong. :)

@diederich
Copy link

@kvnsmth IIRC git submodule sync should only be needed after a change in .gitmodules, e.g. when you change the upstream URL of a submodule.

@icosahedron
Copy link

This doc seems to be corroborated by Subtrees in the Pro Git book

Given the two approaches, subtrees seem the more sane.

@kvnsmth
Copy link
Author

kvnsmth commented Feb 19, 2013

If you use rebase, you'll probably want to use the --preserve-merges flag as discussed in this SO post.

@kvnsmth
Copy link
Author

kvnsmth commented Feb 19, 2013

More on the relationship between subtrees and submodules from this post by Junio:

After all, subtree merge was invented merely as a short-term hack to serve as a stop gap measure until submodule support becomes mature.

@schwa
Copy link

schwa commented Apr 30, 2013

so kvnsmith? what's the verdict after a few months?

@kvnsmth
Copy link
Author

kvnsmth commented Aug 26, 2013

@schwa Still love the idea. But, the command line interface is horrible (not that git shines here anyhow…) and it seems like the git core team are working on a real solution. So, deciding to hold off for now. Submodules are sufficient most of the time.

@rcdailey
Copy link

2 years later... any feedback on subtrees? How are you enjoying them? I understand subtrees were deprecated in favor of submodules, but I still feel that submodules have various drawbacks:

  • Pull requests are not unified, you need to manage dependencies which hurts team productivity (at companies)
  • Highly cohesive projects (app and library) that are linked together via submodule introduce a maintenance/administration layer. Submodules need to be constantly kept in sync.
  • Branching workflows are complex (submodules are not branched when I branch the parent repo)

@funkytaco
Copy link

Curious about everyone's current thoughts as well as I'm researching git submodules versus subtrees.

@luca-ing
Copy link

@funkytaco I'm curious as well.

The thing that I find offputting about subtrees is exactly what people praise about them: they hide the existence of other repos a little too well for my taste.

If I clone a repo that contains subtrees, I will not really notice. Whatever changes I make may not find their way back out to the original repo that created the subtree. I'm surprised that nobody else seems to mind this at all.
I fear that while it is robust on my own repo (it will not break, as easily happens with submodules), the cohesion between the linked repos weakens. This is bad if you're developing e.g. a library in one repo and an application that uses your library in a separate repo (which would make a lot of sense to me).
You have to manually split out your changes and apply them to the remote repo.

OTOH submodules are very brittle, so I'm reluctant to use them as well.

I have no good answer right now.

@gayanpathirage
Copy link

Could someone comment about this from enterprise perspective, e.g. company managing more than 100 repos developed and shared between 100s of developers. (NOTE: Each repo is linked as dependencies e.g. Libraries)

@munjeli
Copy link

munjeli commented Apr 12, 2016

@gayanpathirage
I'm doing enterprise infrastructure development for a consultancy and can speak to scale a bit in the context of DevOps if not dev - I'm preparing a conference talk on 'Metaprogramming in Metarepositories' I've gone to subtrees rather than submodules because of the difficulty for users, but I really liked programming with submodules to expose the service dependencies when I'm programming across repositories. I love it as an infrastructure dev environment, but am still working to understand where it otherwise makes sense.

Regarding repositories with hundreds of subs - there's use cases where it seems to make sense (one company I worked with used a metarepository to integrate multiple chef-repos) but you need to write tooling to handle it effectively and abstract away the risk of corrupting the repository through mishandling. It can also be slow to pull, obviously, and parallel pulling is essential, or working on a propagated box. I use a small metarepository on a buildserver to integrate a handful of applications with their deploy code, and it's a bit slow to pull, but once it's there I can do a lot of builds quickly and there's better transparency for failure around the automation as all the code is in the same place. I have dev push and pull into the individual repositories, and the metarepository is largely for automation.

Generally, devs don't like metarepositories! Too much source control complexity! But metarepos can model actual infrastructure through source dependencies, and that model can go a long way to explaining your infrastructure patterns. I don't like the idea of HUGE metarepositories; the pulling is going to be messy or risky; it's not going to be efficient generally, it's a way to distribute an entire collection of applications and libraries for exploration or running automation rather than everyday contributions.

@YueLinHo
Copy link

Could someone comment about this from enterprise perspective

the entire Windows codebase is moving to a single Git repo... and From the Design History:

Submodules
...
In the end, we dropped that approach, because it created nearly as many problems as it fixed. 

For one, we found that we were complicating people’s workflows
...

Second, it’s not really possible to do atomic commits or pushes across multiple repos
...

And third, most developers are not interested in becoming version control experts, 
...

Can't find anything about subtree there, perhaps it's not even an option. :P

Another ref.:

We started down at least 2 failed paths to scale Git.
Probably the most extensive one was to use Git submodules to stitch together lots of repos into a single “super” repo.
I won’t go into details but after 6 months of working on that we realized it wasn’t going to work
– too many edge cases, too much complexity and fragility.
We needed a bulletproof solution that would be well supported by almost all Git tooling.

(From https://blogs.msdn.microsoft.com/bharry/2017/02/03/scaling-git-and-some-back-story/)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment