// this is a work in progress summarization doc.
// see github.com/ipld/specs for more complete reference!
In the IPLD project, we're interested in defining a universally compatible data model. With this, we separate serialization from structure.
-
Easier to design with -- reason about the structure of the data; worry about the serialization later.
-
Easily switch serialization later -- performance constraints appear, or wire transports change? C'est la vie. No problem.
-
If using a binary encoding for performance? Replace it easily with a human-readable one for debugging. Interchangeable.
-
Learn one set of semantics... and apply it to any number of specific/task-optimized serialization formats.
Furthermore, we're building three major groups of functionality on the basis of this unified data model:
-
a system of data definition schemas.
-
advanced layout systems for handling larger data
-
selectors.
This is the first time a project has separated serialization from schema completely rigorously.
With IPLD Schemas, you can specify semantics in the Schema; know exactly and unambiguously how it's going to map to the Data Model; and then supply (or swap) the details of a serialization Codec at a later date.
IPLD Schemas have features for flexible represetations of the data.
A 'struct' in IPLD schemas can be transformed into a map at the Data Model layer, and then serialized as such in whatever codec (JSON, CBOR, etc) is used (so the map keys are present in serial data, and self-describe the data).
Or, the same 'struct' type can be set to be "representation tuple", in which case it will map into a list at the Data Model layer, and then be serialized as such (so the data becomes less self-describing -- effectively, this asymptotically approaches protobuf-style semantics if you use this strategy for your entire protocol -- but also saves on repeated serialized keys.)
The choice can be yours when using IPLD and IPLD Schemas. Remarkably, the choice can be yours later, and it's easy to change your mind, or indeed even switch representations for debugging purposes without rewriting your data structures.
The existence of this data model also enables us to build libraries for generalized traversals and folds, etc, over the data model structure. Thereafter, those functions naturally work for any serial data.
In particular, "Selectors" are a system for specifying complex, recursive traversals -- you can think of Selectors as being roughly like a regexp (with matching groups), but for IPLD data (in fact, anything that can be mapped to the IPLD Data Model). Selectors are handy for requesting merkle-proofs containing some deeply nested data.
The essence of this data model resembles roughly JSON. (It turns out most things "roughly resemble JSON", these days. Msgpack, YAML, CBOR, and many more, can be considered to "resemble JSON".) So if JSON is familiar, the IPLD Data Model will be familiar. Strings are strings; maps are maps; lists are lists; etc.
"Advanced layouts" are an extension mechanism by which we can access large amounts of data, potentially split across many serial blocks, in the same Data Model we use for every other piece of individual serial data. This means we can use our same functional goodness (Selectors, etc) over large datasets transparently.
// this needs more writing than I can complete today :)
See https://media.ccc.de/v/gpn19-105-foundations-for-decentralization-data-with-ipld for more info on advanced layouts, and their relationship to the big picture.