Skip to content

Instantly share code, notes, and snippets.

@joliss
Last active August 29, 2015 14:02
Show Gist options
  • Save joliss/73804f5c8efa262c0b7d to your computer and use it in GitHub Desktop.
Save joliss/73804f5c8efa262c0b7d to your computer and use it in GitHub Desktop.
Optimizing merge-trees

Breaking down what merge-trees spends time on, on a set of 500 small files:

walkSync 6 ms
pathManipulation 7 ms      <------- string stuff, doesn't hit file system
mkdir 1 ms
copy 71 ms     <------- clearly unacceptable, and gets much worse with big image files
mergeTotal 86 ms

When we symlink files instead of copying:

walkSync 6 ms
pathManipulation 6 ms
mkdir 1 ms
symlink 28 ms     <-------- still on the slow side actually 
mergeTotal 42 ms

We can do a further very awesome optimization, however, which is to symlink entire subdirectories. That is, when we merge directories foo and bar, then if foo/somedir exists but bar/somedir doesn't exist, we can make destDir/somedir a symlink to foo/somedir. Thus we don't need to recurse into foo/somedir anymore. This should drop the merge time to near-zero.

Even with smart copying/symlinking (i.e. reusing destDir), we'd have to stat each input file at least once (this is in the walkSync timing). It's pretty fast, but goes up linear with # of files. Symlinking entire subdirectories avoids this.

A prerequisite for this optimization is that other plugins will recurse into symlinked directories. We don't currently do this (would need stat instead of lstat in walk-sync here), but I think it would be perfectly sensible to specify that this is expected behavior.

What do you think?

P.S. I pushed the benchmarking code to https://github.com/broccolijs/broccoli-merge-trees/tree/benchmark (2 top commits, run coffee test/benchmark.coffee)

P.P.S. I don't necessarily get notified for comments on this gist, so let's continue the discussion on #broccolijs. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment