Skip to content

Instantly share code, notes, and snippets.

@i-e-b
Created April 12, 2012 02:51
Show Gist options
  • Select an option

  • Save i-e-b/2364359 to your computer and use it in GitHub Desktop.

Select an option

Save i-e-b/2364359 to your computer and use it in GitHub Desktop.
Notes on Git and Hg

Problems with Git (and Hg) and some solution ideas

  1. Storing changes to regenerated binaries (i.e. output from another project) is inefficient and causes repo bloat.

    I want to be able to store the generated output (dlls etc) from one project to another, as each should stand on it's own for CI & deployment purposes. I don't care about their history, as they can be cheaply regenerated.

    I should be able to mark part of a repo as shallow-only. It should only keep current and 1 previous version of files in repo. It should not bother to diff the files or do anything clever with them. It should not try to merge incoming updates, just use theirs; I'll regenerate if I need to.

  2. Sub-modules are flakey and can be problematic

    I want to be able to keep a master structure and be able to both check-out the whole lot, or just a part -- and work on either equally well

    Should be able to include, pull & push sub-directories as easily as svn up and svn commit on sub-folders. It can be limited to named, specific folder. It should be able to import from anywhere (a-la Hg subrepos, git submodules). It should be able to commit & push from the master container and have that push to all sub-repos. It should pull all sub-repos on a pull to the master container. It should be able to nest at multiple levels.

  3. No-one does SVN:Externals correctly -- not even SVN!

    I want to be able to handle contracts solutions for my design-by-contract CQRS-ish systems.

    I should be able to have sub-repos which are read-only: can't be pushed from the master container. They should have a separate repo address for read-only and read-write. They should always use read-only version for sub-repos, without me needing to do so explicitly; and should checkout files on the OS marked as read-only as far as possible -- to avoid frustrating errors.

  4. Known binary files types are still dumb-merged

    I want to be able to 3-way-merge images if possible, and to be able to see diffs in a more sensible way

    Should be able to plug diff-match-patch strategies (tricky for distributed scm software -- need a transportable plug>)

  5. Diff-match-patch software is poor.

    I want to use a smart, fast system -- like Neil Fraser's DMP.

    It should be language-rules aware if possible. No point diffing non-critical whitespace, or intra-word changes in a lot of code. Word-by-word better than line-by-line for a lot of code but not all. It should be possible to try both sides of a merge (by compiling and unit testing etc) -- so be able to swap between [local|incoming|merge candidate] before settling. It should be possible to go back to a file's merge and try a different merge later on -- merges should be well-defined things in the repo history, not just saving their outcome in the diff.

    Patches should come in as a set of merge conflicts which can be handled in this 'good' way and will show up well in the history.

    Pull-requests (like on git hub) are a good way to handle patches from one group of developers to another, where neither group can trust each other and where the two development trees may be far-separated. This could be handled as a pull-into (from one repo's revision ID into another's head), creating a set of merge-operation-candidates -- which can then be edited and accepted/rejected and show up nicely in the repo history.

    Pulls that would cause local conflicts can be handled without 'stashes', and can be picked as keep-mine-current or use-theirs. Maybe non-resolved merges can be committed, but not pushed?

  6. Working closely with other developers in non-far-flung locales is a pain

    I want to be able to have a persistent, fast-updating connection between two repos, so that saved changes are transported between the two in near real-time.

  7. Tooling is a bit crap; All the tooling seems to be based around complex and powerful commands rather than simpler but more chatty ones. This makes GUIs awkward and the learning curve steep.

    I want a few very simple commands, merge handling that doesn't break anything and no tools or actions that leave repos in a problematic state.

(should have tools that are useful for scripting - like platform build. Useful and reliable return codes and output. stash push/pop is a good example of crap git behaviour)

  1. Origin repos have very little authority -- a side effect of the distributed nature, but it makes squashing history very dodgy, removing accidentally pushed files dodgy and causes the "I pushed and removed a big file, now every one has to pull it with the repo" problem.

    I want to be able to separate an authoritative truely-historical repository from a curated 'official history' version.

    It should have Pulls take from the 'official' line. It should have pushes go to the 'true' line, then trickle to the 'official'. It should be possible to curate the 'official' repo's history (probably as. a set of edits to the true history, and a cached pullable set of objects?). It should not be possible to curate the history from a push -- all normal guarantees of history should be in place. Pulls should notice when the curated history is changed and do a fresh checkout (or a local curation?).

    Maybe curation is a whole separate sub-api thing that can be pushed about as well-defined change sets.

  2. Too focused on massive distributed, untrustworthy groups -- causes friction in company department where we're all together and trust each other.

    I want to write my own scm, partly to solve these issues and partly to learn. I want to write it in Mono-compatible .Net -- just because. Should probably be based off Hg, as it's got the better tooling for importing from SVN & Git, and seems to be a better match for what I want anyway.

    A little bit like centralised, a little bit like distributed. Maybe "DisCent", 'dc' for short -- that's easy to type! dcup, dcpush etc.

That's all I can think of for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment