This is a reverse-spec, of a currently existing, working production pipeline. It deliberately does not discuss history, nor better options, as the cost of modifying the existing system is insurmountably high.
The main purpose of this document is to allow other parties to determine whether a newly specced BAO-like structure can represent this existing construction.
The document is complete, but written horribly in a hurry. Shame upon the author's house.
The current Filecoin Data Commitment "hash" is a (standardization pending !!!) concatenation of
-
an integer payload length
-
254 bit root-digest of a binary merkle tree
Note: the concatenation of these two items is deliberately not passed through a compression round: multiple trees exists corresponding to the same digest, with the payload length parameter differentiating them.
There is no differentiation between leaves and intermediate nodes unlike other hash constructions (e.g. blake3)
The compression function is standard sha2-256, truncated to 254 bits after every round
Every node in the tree, including the leaves is expected to have a size of 254 bits. Because of this data undergoes a linear "extrusion" step, such that for every 127bytes of payload, the tree compression expects 128bytes/4leaves (127*8
== LCM(254,8)
)
-
Take input buffer of size
L
-
If
L
is not exactly127*2^N
, pad with\x00
until the resulting buffer is of size127*2^N
, whereN
is an integer -
Extrude the padded result from above step, so that every 127 bytes end up being represented as a group of 4 "leaves" of 254 bits each. This results in buffer of size
128*2^N
-
The resulting buffer is now a collection of
2^(N+2)
254 bit (32 buffer-bytes) leaves. Treat as binary tree, pairwise-reducing to a single root using the 254-truncated-sha2-256 function. -
Return resulting digest:
struct {
PayloadLength int64
RootDigest [32]byte
}