Skip to content

Instantly share code, notes, and snippets.

@othiym23
Last active August 29, 2015 14:05
Show Gist options
  • Save othiym23/66c04f1a20edcc4f2392 to your computer and use it in GitHub Desktop.
Save othiym23/66c04f1a20edcc4f2392 to your computer and use it in GitHub Desktop.

Use cases

  1. As npm, I want to safely put only valid metadata and tarballs into the npm cache.
  2. As npm, I want to ensure that multiple npms running concurrently don't download the same things repeatedly.

Requirements

  1. single-machine
  2. no native dependencies
  3. multi-process serialization for both reads and writes
  4. uses operations supported on all supported Node platforms

Desiderata

  1. operations are atomic when possible, and designed to be collision resistant when not
  2. no busy-wait locking
  3. process liveness checking: if something is holding access to the DB, it's still running
  4. uses only plain (tar|JSON) files for storage

Pieces

  • hash-cache: has most of this in one thing, but liveness checking isn't great
  • lockfile: doesn't do liveness checking
@othiym23
Copy link
Author

In discussion with @isaacs today, I learned that fs.open() and fs.unlink() are atomic operations along with fs.rename(). This is a lot more flexible than I was fearing, and with some care gives us a way to build an atomic compare and swap operation.

In discussion with @seldo, we identified a few more constraints:

  • In the case of a cold cache and a parallel install process, it is almost certain that the uses cases where multiple npm processes are running will also be cases where those processes are in contention to download the same set of files. This means that both the case of no database (i.e. last write wins with atomic renames, which should be perfectly safe for a content-addressible store) and in the case of serialization-level isolation, we're going to see pretty major performance regressions.
  • Using the filesystem as the lock store may lead to volatile / unacceptable latency, as opens may be unacceptable slow, especially when many processes are trying to read from / write to the database. Having one store plus a lockfile (versus the current approach of a lockfile per request) may lead to even more contention and volatility in performance. @seldo was musing about the ability to use binding on a known port as the locking operation, but that was dismissed as impractical.
  • Also dismissed as architecture astronautics was the idea of npm spawning some kind of in-memory lock manager process or peer-to-peer semantics for coördinating locking activity. One way or another, the filesystem is the source of truth for determining which process is caching a given package / tarball.
  • READ_COMMITTED isolation (with transactions locking only individual rows) would yield significant performance gains over SERIALIZABLE, at the cost of additional complexity and the concomitant difficulty in proving the correctness of the implementation.

Finally, @seldo pointed out that we still haven't identified the root cause of the staleness and deadlocks in lockfile, on Windows or anywhere else. We suspect there are problems with lockfile's implementation (or npm's use of it), but don't know what they are. Further investigation and testing is probably called for before committing to the lunacy of writing a new database / locking engine in JavaScript. @seldo, @isaacs, and I will decide how to proceed on this shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment