Skip to content

Instantly share code, notes, and snippets.

@rklaehn
Last active October 30, 2022 16:49
Show Gist options
  • Save rklaehn/f229e3c23f42929db8be40f44d26da73 to your computer and use it in GitHub Desktop.
Save rklaehn/f229e3c23f42929db8be40f44d26da73 to your computer and use it in GitHub Desktop.
Wikipedia scenario

Scenario

  • Moderate size dataset (300GB)
  • Too large to be stored entirely on end user hardware
  • Seeder is not fast enough to serve all clients
  • All users have small part of the dataset, but none have all
  • User on consumer hadware want to browse with low latency

This scenario is mostly about content discovery, but it is a hard scenario that the hypercore team had issues with.

I don't think content discovery and content retrieval can be completely separated while staying efficient. Ideally you want to use the same format for the answers of content discovery to ask for content. Hypercore is doing this with a compressed bitmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment