Skip to content

Instantly share code, notes, and snippets.

@paniq
Last active May 26, 2024 07:01
Show Gist options
  • Save paniq/69c98b463b8f2080c5ce9bf1313cbd31 to your computer and use it in GitHub Desktop.
Save paniq/69c98b463b8f2080c5ce9bf1313cbd31 to your computer and use it in GitHub Desktop.
The Arcmem Binary Interchange Format

The Arcmem Binary Interchange Format

The Arcmem heap can be partially serialized to a streamable format, here called .arc (spoken as "dot-arc"). This format is not a file format per se, but can be used as the payload of one.

Each .arc describes heap objects in arbitrary order, with the requirement that each object must be declared, in full or as forward declaration, before it can be referenced. An object may only be fully declared once, and forward declared once only before the full declaration appears.

By convention, the first element of a stream is considered to be the root object by which all other objects are reachable.

The header of a heap declaration looks as follows:

{ objectid + size - 1 : u48, unused : u15, is_forward_decl : u1 } : u64

This header does in fact reserve a uniquely owned object address range, from objectid to objectid + size excluded. No two object address ranges may overlap. objectid must be aligned to 2^ceil(log2(size)). Bits 37 .. 42 are used to encode the number of offset bits required, with 0 mapping to at least 5 bits, and 31 mapping to 36 bits. By convention, bit 42 is always set to 1.

When encoding, size - 1 is OR-ed into the zero bits of objectid. objectid and size can be decoded as follows:

objectid := p & -(32:u64 << ((p >> 37) & 31))
size := p - objectid + 1

If is_forward_decl is 1, then no further data follows for this object. The full declaration of the object must then appear later in the stream. Otherwise, the unaligned data for this object follows:

pointer_bitmap : u8[size // 8]
data : u8[size]

pointer_bitmap linearly maps to data at a ratio of 1 to 64 bits, indicating which 64-bit aligned values in data are in fact pointers to other objects (tag set to 1), rather than just a binary blob (tag set to 0).

Values tagged as pointers can either be 0:u64, or must be in the range of any other previously declared object. If bit 63 is set, the value is to be considered a weak reference rather than a strong one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment