Self-Framing Packed Binary Lengths

We describe a variable-width binary encoding format for unsigned integer lengths that handles values up to 1073741823 $(2^{30} - 1)$ and does not require complex bit manipulation to pack and unpack.

In summary: The input is encoded as little-endian, with the excess length of the encoding packed into the lowest-order 2 bits of the value. This makes the encoding self-framing, as the decoder can consume the first byte and immediately know how many additional bytes must be read to recover the original value. The only operations required are integer addition, multiplication, and division/remainder by powers of 2 (specificially, 4 and 256).

The algorithm below is described for 32-bit unsigned integers, but it can be extended trivially to 64-bit integers by using the lowest-order 3 bits instead of 2, or contracted to 16-bit integers using the lowest-order 1 bit.

A Go implementation

Encoding

Input: An unsigned 32-bit integer $0 ≤ n < 1073741824$.
Output: An array of 1 to 4 bytes encoding $n$.

Procedure:

Let $v_1 = n\cdot 4$.
Let $v_2 = n + s$ where
- $s = 0$ if $n < 64$, else
- $s = 1$ if $n < 16384$, else
- $s = 2$ if $n < 4194304$, else
- $s = 3$.
Encode $v_2$ into an array $B$ of 4 bytes in little-endian order (least-significant byte first).
Return he first $s+1$ bytes of $B$ as output.

Notes: The value in (1) cannot overflow because $n < 2^{30}$. Step (2) can also be used in isolation to precompute the encoded size of $n$ which is $s+1$ bytes.

Decoding

Input: An array $B$ of bytes.
Output: The value $n$ encoded by the prefix of $B$ and the length of the encoding; or else an error.

Procedure:

If $|B| = 0$ ($B$ is empty), report an error ("empty input").
Let $s = B[0]\ \%\ 4$.
If $|B| < s+1$, report an error ("truncated encoding").
Let $v = 0$
For each $i \in (s, s-1, \ldots, 0)$, set $v \leftarrow (v\cdot 256) + B[i]$.
Return $n = v\ //\ 4$ (the encoded value) and $nb = s+1$ (the length of the encoding).

Notes: The symbol $\%$ denotes the integer remainder operation, and $//$ denotes integer division discarding remainder, so that $a\ //\ b \equiv \lfloor a / b \rfloor$.

creachadair/packed-length.md

Self-Framing Packed Binary Lengths

Encoding

Decoding