Exploration for GraphSync

This is where the actual retrieving of the data happens: https://github.com/ipfs/js-ipfs-unixfs-engine/blob/7992ad9860360da3f7a9fb0639e4a7a67746057f/src/exporter/file.js#L117

Where the Wantlist (the block that IPLD requested) is sent off: https://github.com/ipfs/js-ipfs-bitswap/blob/51f5ce08bad4876c9b709eba27faac533e9c00d4/src/want-manager/index.js#L125

Next step: find where the wantlist is received Decision Engine gets the Wantlist and then sends of the corresponding blocks Here it gets the blocks it should send from the blockstore: https://github.com/ipfs/js-ipfs-bitswap/blob/51f5ce08bad4876c9b709eba27faac533e9c00d4/src/decision-engine/index.js#L99

The real bottlenck is the message size: https://github.com/ipfs/js-ipfs-bitswap/blob/51f5ce08bad4876c9b709eba27faac533e9c00d4/src/decision-engine/index.js#L21 The bigger, the faster are things. So it is kind of the back and forth, but GraphSync won't help here.

Here we get the root of the file: The item of https://github.com/ipfs/js-ipfs-unixfs-engine/blob/7992ad9860360da3f7a9fb0639e4a7a67746057f/src/exporter/resolve.js#L37 is the root node, which got speficied here: https://github.com/ipfs/js-ipfs-unixfs-engine/blob/7992ad9860360da3f7a9fb0639e4a7a67746057f/src/exporter/index.js#L58

How Graphsync could work

Blocking

The request needs to block until all data of the subtree is retieved. Currently Bitswap requests one block and can block until its retrieval as there is a notification once that block arrived. This doesn't work with GraphSync as we don't know which blocks might arrive beforehand.

Questions:

How do you know that the last item was retrieved?
- Possible solution: send a "done" message
How does such a stream of data relate to pubsub?

Storage/processing

At the moment Bitswap is first storing the retrieved data in its blockstore before it is processed any further.

Questions:

Does this make sense for GraphSync as well?

UnixFS partial requests

In UnixFS you can retrieve files from a certain offset on. This can be optimized during graph traversal. For this traversal the size field is checked/accumulated. This can easily expressed with code, but it's hard to do in a formal language (it will be code at the end).

Question:

Should there be support for dynamic code-based traversals in GraphSync?
- (@vmx) I lean towards having pluggable "traversal modes" which will be used with transmitting a certain flag. They will be hard-coded for every implementation of GraphSync. Once we have IPLD M2 that can probably used instead.

vmx/exploration-graphsync.md

How Graphsync could work

Blocking

Storage/processing

UnixFS partial requests