- As npm, I want to safely put only valid metadata and tarballs into the npm cache.
- As npm, I want to ensure that multiple npms running concurrently don't download the same things repeatedly.
- single-machine
- no native dependencies
- multi-process serialization for both reads and writes
- uses operations supported on all supported Node platforms
- operations are atomic when possible, and designed to be collision resistant when not
- no busy-wait locking
- process liveness checking: if something is holding access to the DB, it's still running
- uses only plain (tar|JSON) files for storage
- hash-cache: has most of this in one thing, but liveness checking isn't great
- lockfile: doesn't do liveness checking
In discussion with @isaacs today, I learned that
fs.open()
andfs.unlink()
are atomic operations along withfs.rename()
. This is a lot more flexible than I was fearing, and with some care gives us a way to build an atomic compare and swap operation.In discussion with @seldo, we identified a few more constraints:
READ_COMMITTED
isolation (with transactions locking only individual rows) would yield significant performance gains overSERIALIZABLE
, at the cost of additional complexity and the concomitant difficulty in proving the correctness of the implementation.Finally, @seldo pointed out that we still haven't identified the root cause of the staleness and deadlocks in
lockfile
, on Windows or anywhere else. We suspect there are problems with lockfile's implementation (or npm's use of it), but don't know what they are. Further investigation and testing is probably called for before committing to the lunacy of writing a new database / locking engine in JavaScript. @seldo, @isaacs, and I will decide how to proceed on this shortly.