Breaking down what merge-trees spends time on, on a set of 500 small files:
walkSync 6 ms
pathManipulation 7 ms <------- string stuff, doesn't hit file system
mkdir 1 ms
copy 71 ms <------- clearly unacceptable, and gets much worse with big image files
mergeTotal 86 ms
When we symlink files instead of copying:
walkSync 6 ms
pathManipulation 6 ms
mkdir 1 ms
symlink 28 ms <-------- still on the slow side actually
mergeTotal 42 ms
We can do a further very awesome optimization, however, which is to symlink entire subdirectories. That is, when we merge directories foo and bar, then if foo/somedir exists but bar/somedir doesn't exist, we can make destDir/somedir a symlink to foo/somedir. Thus we don't need to recurse into foo/somedir anymore. This should drop the merge time to near-zero.
Even with smart copying/symlinking (i.e. reusing destDir), we'd have to stat each input file at least once (this is in the walkSync timing). It's pretty fast, but goes up linear with # of files. Symlinking entire subdirectories avoids this.
A prerequisite for this optimization is that other plugins will recurse into symlinked directories. We don't currently do this (would need stat instead of lstat in walk-sync here), but I think it would be perfectly sensible to specify that this is expected behavior.
What do you think?
P.S. I pushed the benchmarking code to https://github.com/broccolijs/broccoli-merge-trees/tree/benchmark (2 top commits, run coffee test/benchmark.coffee
)
P.P.S. I don't necessarily get notified for comments on this gist, so let's continue the discussion on #broccolijs. :)