Skip to content

Instantly share code, notes, and snippets.

@bengl
Last active August 29, 2015 14:10
Show Gist options
  • Select an option

  • Save bengl/a5689c0eac15eff4cd59 to your computer and use it in GitHub Desktop.

Select an option

Save bengl/a5689c0eac15eff4cd59 to your computer and use it in GitHub Desktop.

Populating npm caches with BitTorrent

NOTE: This concept has probably been covered before, so if anyone can point me in the direction of some prior art, that would be great.

This past June, I attended NodeConf for the first time. While it was an amazing event, and I've love to go next year, it wasn't without its hiccups. In particular, all the sessions involved some npm install, and with hundreds of programmers on a small pipe to the internet, this became rather tiresome rather quickly.

First of all, cheers to everyone involved in setting up npm registry mirrors at the event (npm, Inc. and others). This worked out reasonably well, but often the mirrors were just slightly out of date, which was problematic because many instructors were updating their course material in real-time.

This sort of situation seems like it can be relatively common, so lots of solution ideas came from this. Here is one such solution.

The primary thing to do is reduce the amount of data going through the tiny connection. BitTorrent is an obvious choice when one needs to distribute data across a large number of machines, in a decentralized manner. If we can populate npm's cache using BitTorrent before running npm install, we can greatly reduce the amount of data that needs to be retrieved from the registry.

Here is a (simplified) procedure that might make this work:

  1. A project author creates a list of bittorrent infohashes corresponding to all the npm-available modules required by the project.
  2. The author begins seeding all of the torrents in the list.
  3. The project author publishes this list.
  4. People can then torrent-download all the packages in the list (hopefully seed as well), and save the packages into their npm cache.
  5. When they run npm install, then packages will be retrieved from cache. The only data required from the registry is a single 304 response per package.

This could work with the following components:

  • Torrent Daemon: This would act as the primary torrent client. On startup, it would examine the npm cache, and seed torrents of cached packages. It would watch the cache directory for changes, adding and seeding new packages as they become available. It would also need an RPC system to initiate downloads and inform RPC clients when downloads have finished.
  • CLI Tool:
    • Installer/Populator: This would take a given list of package infohashes and add them to the torrent daemon if they're not already installed. Once all packages are verified to be in the cache, this process would exit succefully.
    • List Generator: This would generate a list of dependencies for a given project, along with their infohashes, based on whatever's currently in the npm cache. This list can then be distributed along with the project in the same manner that npm-shrinkwrap.json would, as it serves a similar purpose. The list is the input for the installer/populator.

This could then easily be adapted for distributing base projects as well.

Notes

  • For this system to make the most sense, we'd probably be talking about trackerless torrents, and setting the torrent-daemon to prefer local peer discovery.
  • Since using infohashes to install packages implies a very static set of versions, it might make sense to add the infohash values to the npm-shrinkwrap.json file, rather than making a new file to store the package list.
  • Infohashes aren't deterministic, since they're a function of the creation date. One potential solution is to manually set the creation date to be the UNIX epoch (e.g. new Date(0) in JS). This isn't elegant at all, but will probably get the job done.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment