Skip to content

Instantly share code, notes, and snippets.

@jandubois
Created February 24, 2026 04:48
Show Gist options
  • Select an option

  • Save jandubois/e80c17e3322b53bf0a89a886a6ea08de to your computer and use it in GitHub Desktop.

Select an option

Save jandubois/e80c17e3322b53bf0a89a886a6ea08de to your computer and use it in GitHub Desktop.

APFS In-Place UID/GID Patching: Implementation Plan

Goal

Write a minimal APFS tool (pkg/apfs/chown.go) that modifies file ownership (UID/GID) directly on a raw APFS disk image, bypassing the kernel VFS. This eliminates the sudo chown root:wheel requirement from macOS guest disk patching.

Why this works

APFS inode records store owner (UID) and group (GID) as plain little-endian uint32 fields at fixed offsets within B-tree leaf nodes. Each metadata block is protected by a single Fletcher-64 checksum. Modifying UID/GID requires changing 8 bytes and recomputing one checksum per affected block. No parent pointer updates, no B-tree restructuring, no superblock changes.

Validated by Goebel 2019 (Universität der Bundeswehr): raw APFS block modification with checksum recalculation produces a valid filesystem.

License constraints

All open-source APFS implementations are GPL (v2 or v3). Lima is Apache 2.0. We can read GPL code to understand the on-disk format (facts are not copyrightable) but must write a clean implementation. Apple's APFS Reference PDF is the primary specification.

On-disk structures (from Apple APFS Reference)

Object header — obj_phys_t (32 bytes)

Offset Size Field Type
0 8 o_cksum uint64 — Fletcher-64 checksum
8 8 o_oid uint64 — object identifier
16 8 o_xid uint64 — transaction identifier
24 4 o_type uint32 — type + storage class
28 4 o_subtype uint32

Container superblock — nx_superblock_t (starts at block 0)

Key fields (offsets from block start):

Offset Field Notes
32 nx_magic Must be 0x4253584E ("NXSB")
36 nx_block_size Typically 4096
104 nx_xp_desc_blocks Checkpoint descriptor area size (mask 0x7FFFFFFF)
112 nx_xp_desc_base Checkpoint descriptor area start (physical block)
136 nx_xp_desc_index Current checkpoint start index
140 nx_xp_desc_len Current checkpoint length
160 nx_omap_oid Container object map (physical address)
184 nx_fs_oid[100] Volume OIDs (8 bytes each, 800 bytes total)

Volume superblock — apfs_superblock_t

Offset Field Notes
32 apfs_magic Must be 0x42535041 ("APSB")
128 apfs_omap_oid Volume object map (physical address)
136 apfs_root_tree_oid Filesystem B-tree root (virtual OID)
704 apfs_volname Volume name (256 bytes, UTF-8)
964 apfs_role Volume role (Data = 0x0040, aka APFS_VOL_ROLE_DATA)

Object map — omap_phys_t

Offset Field Notes
48 om_tree_oid Physical address of omap B-tree root

Omap B-tree keys/values

  • omap_key_t: ok_oid (8) + ok_xid (8) = 16 bytes
  • omap_val_t: ov_flags (4) + ov_size (4) + ov_paddr (8) = 16 bytes

B-tree node — btree_node_phys_t

Offset Size Field
0 32 obj_phys_t header
32 2 btn_flags
34 2 btn_level (0 = leaf)
36 4 btn_nkeys
40 4 btn_table_space (nloc_t: off u16 + len u16)
44 4 btn_free_space (nloc_t)
48 4 btn_key_free_list (nloc_t)
52 4 btn_val_free_list (nloc_t)
56 var btn_data[] — ToC, keys, values

If btn_flags & BTNODE_ROOT, a btree_info_t (40 bytes) sits at the end of the block. This reduces the value area by 40 bytes.

Table of Contents entries

  • kvloc_t (8 bytes): key nloc (off u16 + len u16) + value nloc (off u16 + len u16) Used when BTNODE_FIXED_KV_SIZE is NOT set.
  • kvoff_t (4 bytes): key offset u16 + value offset u16. Used when BTNODE_FIXED_KV_SIZE IS set.

ToC starts at: btn_data + btn_table_space.off Key area starts at: btn_data + btn_table_space.off + btn_table_space.len Key offsets in ToC are relative to key area start. Value offsets are relative to:

  • End of block (for non-root nodes)
  • End of block minus sizeof(btree_info_t) = 40 (for root nodes)

Values grow backward (offset 0 = rightmost byte of value area).

Filesystem B-tree keys

j_key_t (8 bytes): obj_id_and_type — upper 4 bits = type, lower 60 bits = CNID.

Type Value Key struct
INODE 3 j_inode_key_t: just the 8-byte j_key_t
DIR_REC 9 j_drec_key_t: j_key_t + name_len(2) + name(var)
j_drec_hashed_key_t: j_key_t + name_len_and_hash(4) + name(var)

For hashed keys: name_len_and_hash lower 10 bits = name length, upper 22 = hash.

Filesystem B-tree values

j_inode_val_t (92+ bytes):

Offset Size Field
0 8 parent_id
8 8 private_id
16–47 32 timestamps (create, mod, change, access)
48 8 internal_flags
56 4 nchildren/nlink
60 4 default_protection_class
64 4 write_generation_counter
68 4 bsd_flags
72 4 owner (UID)
76 4 group (GID)
80 2 mode
82 2 pad1
84 8 uncompressed_size
92 var xfields

j_drec_val_t: file_id (8) + date_added (8) + flags (2) + xfields.

Key constants

NX_MAGIC            = 0x4253584E
APFS_MAGIC          = 0x42535041
APFS_VOL_ROLE_DATA  = 0x0040  (or check Apple spec; may be different encoding)

OBJ_PHYSICAL        = 0x40000000
OBJ_EPHEMERAL       = 0x80000000
OBJECT_TYPE_NX_SUPERBLOCK  = 0x01
OBJECT_TYPE_BTREE_NODE     = 0x03
OBJECT_TYPE_OMAP           = 0x0B
OBJECT_TYPE_CHECKPOINT_MAP = 0x0C
OBJECT_TYPE_FS             = 0x0D

BTNODE_ROOT          = 0x0001
BTNODE_LEAF          = 0x0002
BTNODE_FIXED_KV_SIZE = 0x0004

APFS_TYPE_INODE   = 3
APFS_TYPE_DIR_REC = 9
OBJ_ID_MASK       = 0x0FFFFFFFFFFFFFFF
OBJ_TYPE_SHIFT    = 60

ROOT_DIR_INODE_NUM = 2   (root directory "/"'s inode number)

Fletcher-64 checksum algorithm

Input: block bytes [8..blockSize), treated as little-endian uint32 words.

MOD = 0xFFFFFFFF
sum1, sum2 = 0, 0
for each uint32 word w:
    sum1 = (sum1 + uint64(w)) % MOD
    sum2 = (sum2 + sum1) % MOD
ck_low  = MOD - ((sum1 + sum2) % MOD)
ck_high = MOD - ((sum1 + ck_low) % MOD)
checksum = (ck_high << 32) | ck_low

Store in o_cksum (bytes 0–7 of the block).

Navigation path: disk image to inode

  1. Read block 0 → container superblock. Validate magic, checksum.
  2. Scan checkpoint descriptor area for superblock with highest valid o_xid. Area = nx_xp_desc_blocks blocks starting at nx_xp_desc_base. For each block: if type is NX superblock, validate checksum + magic, keep highest xid.
  3. Read container omap at nx_omap_oid (physical). Get om_tree_oid.
  4. For each volume in nx_fs_oid[] (skip zeros): Look up virtual OID in container omap B-tree → physical address. Read volume superblock. Check apfs_volname or apfs_role to find the Data volume.
  5. Read volume omap at apfs_omap_oid. Get om_tree_oid.
  6. Resolve filesystem root by looking up apfs_root_tree_oid (virtual) in volume omap → physical address of fs B-tree root.
  7. Resolve path component by component:
    • Start with root dir CNID = 2
    • For each path component, search fs B-tree for dir rec key: (parent_cnid | (APFS_TYPE_DIR_REC << 60)) + name
    • Get file_id from j_drec_val_t → next CNID
  8. Find target inode: search fs B-tree for key (target_cnid | (APFS_TYPE_INODE << 60))
  9. Modify owner and group at offsets +72 and +76 in the j_inode_val_t.
  10. Recompute Fletcher-64 for the modified block, write back.

B-tree search algorithm

For a given B-tree root (physical block address):

func search(rootBlock, targetKey):
    node = readBlock(rootBlock)
    while true:
        entries = readToC(node)  // kvoff_t or kvloc_t depending on BTNODE_FIXED_KV_SIZE
        if node.btn_level == 0:  // leaf
            for each entry in entries:
                key = readKey(node, entry)
                if key matches targetKey:
                    return readValue(node, entry), node
            return not found
        else:  // internal node
            // Values in internal nodes are child pointers (uint64 OIDs)
            // Find the last key <= targetKey
            childOID = binary search entries for greatest key <= targetKey
            // Resolve OID to physical address via omap (for virtual trees)
            // or use directly (for physical trees)
            node = readBlock(resolve(childOID))

Key comparison

Omap B-tree (fixed-size keys): compare ok_oid first, then ok_xid. For lookups, find the entry with matching ok_oid and highest ok_xid ≤ our target xid.

Filesystem B-tree (variable-size keys): compare obj_id_and_type as uint64 first. For dir rec keys with same obj_id_and_type, compare by name (or name hash for case-insensitive volumes).

Reading keys/values from a node

tocStart = 56 + btn_table_space.off    // 56 = sizeof(btree_node_phys_t fixed part)
keyAreaStart = tocStart + btn_table_space.len

if BTNODE_FIXED_KV_SIZE set:
    toc[i] = kvoff_t at tocStart + i*4
    key = block[keyAreaStart + toc[i].k ..]
    // value offset relative to value area end
else:
    toc[i] = kvloc_t at tocStart + i*8
    key = block[keyAreaStart + toc[i].k.off .. +toc[i].k.len]
    // value offset and length from toc[i].v

valueAreaEnd = blockSize                    // for non-root
             = blockSize - 40              // for root (minus btree_info_t)

// For fixed-size values:
value = block[valueAreaEnd - toc[i].v - valueSize .. valueAreaEnd - toc[i].v]
// For variable-size values:
value = block[valueAreaEnd - toc[i].v.off - toc[i].v.len .. valueAreaEnd - toc[i].v.off]

Wait — need to double-check the value offset convention. From the Apple spec and reference implementations: the value offset is relative to the END of the value area, measured as a distance backward. So:

valueStart = valueAreaEnd - valueOffset - valueLength

Actually more precisely: the value area END is the reference point. v.off is the offset from this reference point to the start of the value data.

valueAddr = valueAreaEnd - v.off    // This points to the END of the value
// No wait — v.off is the start offset measured from the end, so:
valueAddr = valueAreaEnd - v.off - v.len   // ??? Need to verify

NOTE: Must verify the exact value offset convention against the Apple spec during implementation. The key detail is whether v.off points to the first byte or the last byte of the value relative to the reference point.

Implementation structure

Package: pkg/apfs/

Files:

  • chown.go — public API: Chown(diskPath string, filePath string, uid, gid uint32) error
  • structures.go — Go struct definitions for all on-disk types
  • fletcher64.go — checksum computation
  • btree.go — B-tree node reading and search
  • container.go — container superblock + checkpoint scanning
  • volume.go — volume superblock + omap resolution
  • chown_test.go — tests (see below)

Public API

// Chown changes the owner and group of a file on an unmounted APFS disk image.
// The path is relative to the volume root (e.g., "Library/LaunchDaemons/foo.plist").
// The disk image must not be mounted.
func Chown(diskPath string, volumeName string, filePath string, uid, gid uint32) error

Integration with macOS guest patching

In pkg/guestpatch/macos/macos_darwin.go, after writing the LaunchDaemon plist via noowners mount, call:

apfs.Chown(disk, "Data", "Library/LaunchDaemons/io.lima-vm.lima-macos-init.plist", 0, 0)

This replaces the entire privileged phase (sudo chown root:wheel).

Testing strategy

  1. Unit tests for Fletcher-64 (use known test vectors from Apple spec or from a real APFS block).
  2. Integration test: Create a small APFS disk image using diskutil on macOS, write a test file with known UID, run Chown, mount the image, verify stat shows the new UID/GID.
  3. End-to-end: limactl start creates a macOS VM without sudo prompt.

Risks

  1. Value offset convention: The exact meaning of value offsets in B-tree nodes needs verification during implementation. If we get this wrong, we'll read garbage. Mitigated by: validating parsed inode fields (timestamps should be reasonable, mode should be a valid file mode, etc.).
  2. Hashed vs non-hashed directory records: macOS volumes may use either format depending on case sensitivity settings. We need to handle both j_drec_key_t and j_drec_hashed_key_t.
  3. Multiple volumes: A macOS disk has System and Data volumes. We need to target the Data volume specifically (by name or role).
  4. Checkpoint descriptor area scanning: Block 0 may not be the latest superblock. Must scan the checkpoint area.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment