Let's say you have an iOS project, and you want to use some external library, like AFNetworking. How do you integrate it?
Add the project to your repo:
git submodule add [email protected]:AFNetworking/AFNetworking.git Vendor/AFNetworking
or something to that effect.
Well, what happens if you find a bug in AFNetworking that you want to fix? With submodules, I'd usually
- Fork AFNetworking
- Go through the pain of changing my project's submodules to point to my new fork
- Make my change and commit it to my fork
- Submit a pull request to AFNetworking repo
- Wait to see if the pull request is accepted, but keep my fork up-to-date in the meantime
- If it is accepted, do the whole dance to switch my submodule back to the official AFNetworking repo
- Continue as usual
Okay, that sucks. If you've ever done it, you know how painful it is and how finicky submodules can be.
Add the project to your repo:
git subtree add --prefix=Vendor/AFNetworking --squash [email protected]:AFNetworking/AFNetworking.git master
This is pretty similar so far except the other members of your team won't have to remember to run git submodule update because subtrees actually store the source in your repo. Nice.
Now, let's say we have the same bug. What do I do differently now that I'm using subtrees?
I make my change and commit it to my project's repository. Technically, I could stop now if I wanted to since the bug fixed code is in my repository. But, I want to be a good open source citizen, so what do I do?
- I'll fork AFNetworking into my account on Github.
- Back in my local repo:
git subtree split --prefix=Vendor/AFNetworking/ --branch AFNetworking
to set up being able to push changes to my fork. 3. I'll push my change to my fork, but on a branch to make the pull request more awesome.
git push [email protected]:kvnsmth/AFNetworking.git AFNetworking:critical-bug-fix
- I would issue a pull request and hope it gets accepted, but a big difference is that the acceptance of my change doesn't keep me from being able to easily stay in sync with the official AFNetworking repo.
I can still do:
git subtree pull --prefix=Vendor/AFNetworking --squash [email protected]:AFNetworking/AFNetworking.git master
to stay up-to-date with the latest in the official repository.
Now, I think that is much better than using submodules and a lot less invasive to my repo.
@gayanpathirage
I'm doing enterprise infrastructure development for a consultancy and can speak to scale a bit in the context of DevOps if not dev - I'm preparing a conference talk on 'Metaprogramming in Metarepositories' I've gone to subtrees rather than submodules because of the difficulty for users, but I really liked programming with submodules to expose the service dependencies when I'm programming across repositories. I love it as an infrastructure dev environment, but am still working to understand where it otherwise makes sense.
Regarding repositories with hundreds of subs - there's use cases where it seems to make sense (one company I worked with used a metarepository to integrate multiple chef-repos) but you need to write tooling to handle it effectively and abstract away the risk of corrupting the repository through mishandling. It can also be slow to pull, obviously, and parallel pulling is essential, or working on a propagated box. I use a small metarepository on a buildserver to integrate a handful of applications with their deploy code, and it's a bit slow to pull, but once it's there I can do a lot of builds quickly and there's better transparency for failure around the automation as all the code is in the same place. I have dev push and pull into the individual repositories, and the metarepository is largely for automation.
Generally, devs don't like metarepositories! Too much source control complexity! But metarepos can model actual infrastructure through source dependencies, and that model can go a long way to explaining your infrastructure patterns. I don't like the idea of HUGE metarepositories; the pulling is going to be messy or risky; it's not going to be efficient generally, it's a way to distribute an entire collection of applications and libraries for exploration or running automation rather than everyday contributions.