https://github.com/filecoin-project/lassie
Lassie is:
-
A universal retrieval client for IPFS & Filecoin.
-
An IPFS implementation that doesn't store or publish data
-
An IPLD native, and only deals in IPLD blocks as CAR format
-
Written in Go and offers
- a command-line tool (
lassie fetch
) - a minimal IPFS gateway-like HTTP API
- a Go library interface
- a command-line tool (
-
A work in progress
-
Started as a Filecoin retrieval tool, expanded protocol support to now support
- Graphsync (Filecoin)
- Bitswap (IPFS, Filecoin, others, including Elastic-IPFS)
- Verified HTTP CAR (work in progress)
-
Back-end support for the Saturn CDN network to fetch IPFS & Filecoin content
-
Very lightweight content retrieval
- No config files
- No local IPFS node
- No persistent storage
- Very fast startup time
- Just-enough functionality to get IPLD data
-
Integrates with the IPFS & Filecoin network indexer service (cid.contact)
-
"Fetch" a CID queries the indexer and:
- Finds Filecoin Storage Providers that have it
- Queries the IPFS DHT to find nodes that have it
-
Begin Graphsync and/or Bitswap sessions to retrieve the data from the peer candidates (HTTP too, soon)
-
Collect a graph from the requested (root) CID depending on request:
- Is there a path?
Qmfoobar/path/to/thing
- Fetch the entire DAG under the root / path?
- Fetch just the single block under the root / path?
- Fetch just the UnixFS "entity" under the root / path?
- Is there a path?
-
Return content in verifiable CAR format
-
Perfect parther with github.com/ipld/go-car:
lassie fetch -o - Qmfoobar/cats.mp4 | car extract - | ffplay -
-
CID + optional Path:
Qmfoobar/path/to/thing
- Start at CID, walk the
path
to
thething
according to IPLD pathing rules - BUT default to UnixFS pathing semantics where possible
-
Single block fetch: just give me the block at the terminus of
Qmfoobar/path/to/thing
-
Entire DAG fetch: give me the entire DAG under
Qmfoobar/path/to/thing
-
UnixFS entity fetch: give me the UnixFS entity under
Qmfoobar/path/to/thing
- Is it a sharded file? Give me all the blocks for the file
- Is it a sharded directory? Give me all the blocks for the directory but not the leaves
-
= 95% (??) of content stored on IPFS is UnixFS, so let's assume you're fetching UnixFS
-
Not DAG-PB || can't interpret as UnixFS? Default to plain IPLD semantics
-
Pathing is UnixFS:
Qmfoobar/path/to/thing
as UnixFS vsQmfoobar/Links/3/Hash/Links/2/Hash/Links/0/Hash
as IPLD -
UnixFS entity fetch: sharding makes fetching complicated
- Files are often bigger than ~safe IPLD block size so are sharded across many blocks
- Directories with hundreds of entries are sharded using a HAMT to create a complex DAG
- A user "fetching" one of these generally doesn't just want the first block, they want the whole thing
- For sharded directories, the whole thing could be very large, so we can do a shallow fetch
-
github.com/ipfs/go-unixfsnode implements UnixFS as an ADL
-
We can traverse with a combination of go-ipld-prime selectors and go-unixfsnode ADLs
-
Selectors: translate a
path/to/thing
path to a selector with go-unixfsnode that adds in the ADL:unixfs
orunixfs-preload
(for entity fetch), but with safe fall-back for non-UnixFS
-
Using go-ipld-prime's traversal engine for deterministic DAG generation
-
go-unixfsnode provides deterministic UnixFS traversal
-
Verifiable
- Consider https://ipfs.io/ipfs/Qmfoobar/path/to/thing - how can you verify the gateway gave you what you asked for? (You can't!)
- Lassie's output CARs include the requested (root) CID and every block from that CID to the requested content
- User trusts the original CID (presumably), they can verify the CID:Block match and the inclusion of each additional block in the requested path/DAG
- i.e. root CID is the trust anchor, and the trust is transferable to the entire included DAG
-
Deterministic DAG traversal is difficult to properly parallelise
-
Graphsync has in-built parallelism, but it's single peer to single peer: both peers agree on the selector and "sync" blocks using the same traversal
-
Bitswap is multi-peer but has no graph awareness so is harder to parallelise:
- You don't know what links a block contains until you have it
- A deterministic DAG traversal wants blocks in a specific order
-
Pre-fetching to the rescue for Bitswap:
- Run a double-pass of our selector traversal over each block as we load them
- First pass is shallow and just queues all links it encounters for pre-fetching
- Second pass follows links line a normal traversal
- Pre-fetcher runs in parallel with the main traversal, optimistically loading blocks that will eventually be needed
-
Has some difficulties with traversal node & link "budgets"
-
Available for any traversal with go-ipld-prime, using
master
, see: https://pkg.go.dev/github.com/ipld/go-ipld-prime@master/traversal#Config (Config.Preloader
)
-
car verify
to verify CAR format and content (just simple block check) -
car inspect
to provide a summary of the content of the CAR, w/--full
to also verify. -
car extract
to extract UnixFS content:- Can receive from stdin, lassie can send to stdout with
-o -
- Can send output to stdout if it's just a single file
- Can receive from stdin, lassie can send to stdout with
-
More coming as both tools are developed in tandem
(WIP)
-
New graph transport: Verifiable CAR over HTTP
-
Indexer will know if a peer can provide the requested content via HTTP
-
Peers will provide CARs in the same format that Lassie provides them
-
Lassie will verify the CAR as it's passed on