Write a minimal APFS tool (pkg/apfs/chown.go) that modifies file ownership
(UID/GID) directly on a raw APFS disk image, bypassing the kernel VFS. This
eliminates the sudo chown root:wheel requirement from macOS guest disk patching.
APFS inode records store owner (UID) and group (GID) as plain little-endian
uint32 fields at fixed offsets within B-tree leaf nodes. Each metadata block is
protected by a single Fletcher-64 checksum. Modifying UID/GID requires changing
8 bytes and recomputing one checksum per affected block. No parent pointer updates,
no B-tree restructuring, no superblock changes.
Validated by Goebel 2019 (Universität der Bundeswehr): raw APFS block modification with checksum recalculation produces a valid filesystem.
All open-source APFS implementations are GPL (v2 or v3). Lima is Apache 2.0. We can read GPL code to understand the on-disk format (facts are not copyrightable) but must write a clean implementation. Apple's APFS Reference PDF is the primary specification.
| Offset | Size | Field | Type |
|---|---|---|---|
| 0 | 8 | o_cksum |
uint64 — Fletcher-64 checksum |
| 8 | 8 | o_oid |
uint64 — object identifier |
| 16 | 8 | o_xid |
uint64 — transaction identifier |
| 24 | 4 | o_type |
uint32 — type + storage class |
| 28 | 4 | o_subtype |
uint32 |
Key fields (offsets from block start):
| Offset | Field | Notes |
|---|---|---|
| 32 | nx_magic |
Must be 0x4253584E ("NXSB") |
| 36 | nx_block_size |
Typically 4096 |
| 104 | nx_xp_desc_blocks |
Checkpoint descriptor area size (mask 0x7FFFFFFF) |
| 112 | nx_xp_desc_base |
Checkpoint descriptor area start (physical block) |
| 136 | nx_xp_desc_index |
Current checkpoint start index |
| 140 | nx_xp_desc_len |
Current checkpoint length |
| 160 | nx_omap_oid |
Container object map (physical address) |
| 184 | nx_fs_oid[100] |
Volume OIDs (8 bytes each, 800 bytes total) |
| Offset | Field | Notes |
|---|---|---|
| 32 | apfs_magic |
Must be 0x42535041 ("APSB") |
| 128 | apfs_omap_oid |
Volume object map (physical address) |
| 136 | apfs_root_tree_oid |
Filesystem B-tree root (virtual OID) |
| 704 | apfs_volname |
Volume name (256 bytes, UTF-8) |
| 964 | apfs_role |
Volume role (Data = 0x0040, aka APFS_VOL_ROLE_DATA) |
| Offset | Field | Notes |
|---|---|---|
| 48 | om_tree_oid |
Physical address of omap B-tree root |
omap_key_t:ok_oid(8) +ok_xid(8) = 16 bytesomap_val_t:ov_flags(4) +ov_size(4) +ov_paddr(8) = 16 bytes
| Offset | Size | Field |
|---|---|---|
| 0 | 32 | obj_phys_t header |
| 32 | 2 | btn_flags |
| 34 | 2 | btn_level (0 = leaf) |
| 36 | 4 | btn_nkeys |
| 40 | 4 | btn_table_space (nloc_t: off u16 + len u16) |
| 44 | 4 | btn_free_space (nloc_t) |
| 48 | 4 | btn_key_free_list (nloc_t) |
| 52 | 4 | btn_val_free_list (nloc_t) |
| 56 | var | btn_data[] — ToC, keys, values |
If btn_flags & BTNODE_ROOT, a btree_info_t (40 bytes) sits at the end of the
block. This reduces the value area by 40 bytes.
kvloc_t(8 bytes): key nloc (off u16 + len u16) + value nloc (off u16 + len u16) Used whenBTNODE_FIXED_KV_SIZEis NOT set.kvoff_t(4 bytes): key offset u16 + value offset u16. Used whenBTNODE_FIXED_KV_SIZEIS set.
ToC starts at: btn_data + btn_table_space.off
Key area starts at: btn_data + btn_table_space.off + btn_table_space.len
Key offsets in ToC are relative to key area start.
Value offsets are relative to:
- End of block (for non-root nodes)
- End of block minus
sizeof(btree_info_t)= 40 (for root nodes)
Values grow backward (offset 0 = rightmost byte of value area).
j_key_t (8 bytes): obj_id_and_type — upper 4 bits = type, lower 60 bits = CNID.
| Type | Value | Key struct |
|---|---|---|
| INODE | 3 | j_inode_key_t: just the 8-byte j_key_t |
| DIR_REC | 9 | j_drec_key_t: j_key_t + name_len(2) + name(var) |
j_drec_hashed_key_t: j_key_t + name_len_and_hash(4) + name(var) |
For hashed keys: name_len_and_hash lower 10 bits = name length, upper 22 = hash.
j_inode_val_t (92+ bytes):
| Offset | Size | Field |
|---|---|---|
| 0 | 8 | parent_id |
| 8 | 8 | private_id |
| 16–47 | 32 | timestamps (create, mod, change, access) |
| 48 | 8 | internal_flags |
| 56 | 4 | nchildren/nlink |
| 60 | 4 | default_protection_class |
| 64 | 4 | write_generation_counter |
| 68 | 4 | bsd_flags |
| 72 | 4 | owner (UID) |
| 76 | 4 | group (GID) |
| 80 | 2 | mode |
| 82 | 2 | pad1 |
| 84 | 8 | uncompressed_size |
| 92 | var | xfields |
j_drec_val_t: file_id (8) + date_added (8) + flags (2) + xfields.
NX_MAGIC = 0x4253584E
APFS_MAGIC = 0x42535041
APFS_VOL_ROLE_DATA = 0x0040 (or check Apple spec; may be different encoding)
OBJ_PHYSICAL = 0x40000000
OBJ_EPHEMERAL = 0x80000000
OBJECT_TYPE_NX_SUPERBLOCK = 0x01
OBJECT_TYPE_BTREE_NODE = 0x03
OBJECT_TYPE_OMAP = 0x0B
OBJECT_TYPE_CHECKPOINT_MAP = 0x0C
OBJECT_TYPE_FS = 0x0D
BTNODE_ROOT = 0x0001
BTNODE_LEAF = 0x0002
BTNODE_FIXED_KV_SIZE = 0x0004
APFS_TYPE_INODE = 3
APFS_TYPE_DIR_REC = 9
OBJ_ID_MASK = 0x0FFFFFFFFFFFFFFF
OBJ_TYPE_SHIFT = 60
ROOT_DIR_INODE_NUM = 2 (root directory "/"'s inode number)
Input: block bytes [8..blockSize), treated as little-endian uint32 words.
MOD = 0xFFFFFFFF
sum1, sum2 = 0, 0
for each uint32 word w:
sum1 = (sum1 + uint64(w)) % MOD
sum2 = (sum2 + sum1) % MOD
ck_low = MOD - ((sum1 + sum2) % MOD)
ck_high = MOD - ((sum1 + ck_low) % MOD)
checksum = (ck_high << 32) | ck_low
Store in o_cksum (bytes 0–7 of the block).
- Read block 0 → container superblock. Validate magic, checksum.
- Scan checkpoint descriptor area for superblock with highest valid
o_xid. Area =nx_xp_desc_blocksblocks starting atnx_xp_desc_base. For each block: if type is NX superblock, validate checksum + magic, keep highest xid. - Read container omap at
nx_omap_oid(physical). Getom_tree_oid. - For each volume in
nx_fs_oid[](skip zeros): Look up virtual OID in container omap B-tree → physical address. Read volume superblock. Checkapfs_volnameorapfs_roleto find the Data volume. - Read volume omap at
apfs_omap_oid. Getom_tree_oid. - Resolve filesystem root by looking up
apfs_root_tree_oid(virtual) in volume omap → physical address of fs B-tree root. - Resolve path component by component:
- Start with root dir CNID = 2
- For each path component, search fs B-tree for dir rec key:
(parent_cnid | (APFS_TYPE_DIR_REC << 60))+ name - Get
file_idfromj_drec_val_t→ next CNID
- Find target inode: search fs B-tree for key
(target_cnid | (APFS_TYPE_INODE << 60)) - Modify
ownerandgroupat offsets +72 and +76 in thej_inode_val_t. - Recompute Fletcher-64 for the modified block, write back.
For a given B-tree root (physical block address):
func search(rootBlock, targetKey):
node = readBlock(rootBlock)
while true:
entries = readToC(node) // kvoff_t or kvloc_t depending on BTNODE_FIXED_KV_SIZE
if node.btn_level == 0: // leaf
for each entry in entries:
key = readKey(node, entry)
if key matches targetKey:
return readValue(node, entry), node
return not found
else: // internal node
// Values in internal nodes are child pointers (uint64 OIDs)
// Find the last key <= targetKey
childOID = binary search entries for greatest key <= targetKey
// Resolve OID to physical address via omap (for virtual trees)
// or use directly (for physical trees)
node = readBlock(resolve(childOID))
Omap B-tree (fixed-size keys): compare ok_oid first, then ok_xid.
For lookups, find the entry with matching ok_oid and highest ok_xid ≤
our target xid.
Filesystem B-tree (variable-size keys): compare obj_id_and_type as
uint64 first. For dir rec keys with same obj_id_and_type, compare by name
(or name hash for case-insensitive volumes).
tocStart = 56 + btn_table_space.off // 56 = sizeof(btree_node_phys_t fixed part)
keyAreaStart = tocStart + btn_table_space.len
if BTNODE_FIXED_KV_SIZE set:
toc[i] = kvoff_t at tocStart + i*4
key = block[keyAreaStart + toc[i].k ..]
// value offset relative to value area end
else:
toc[i] = kvloc_t at tocStart + i*8
key = block[keyAreaStart + toc[i].k.off .. +toc[i].k.len]
// value offset and length from toc[i].v
valueAreaEnd = blockSize // for non-root
= blockSize - 40 // for root (minus btree_info_t)
// For fixed-size values:
value = block[valueAreaEnd - toc[i].v - valueSize .. valueAreaEnd - toc[i].v]
// For variable-size values:
value = block[valueAreaEnd - toc[i].v.off - toc[i].v.len .. valueAreaEnd - toc[i].v.off]
Wait — need to double-check the value offset convention. From the Apple spec and reference implementations: the value offset is relative to the END of the value area, measured as a distance backward. So:
valueStart = valueAreaEnd - valueOffset - valueLength
Actually more precisely: the value area END is the reference point. v.off is
the offset from this reference point to the start of the value data.
valueAddr = valueAreaEnd - v.off // This points to the END of the value
// No wait — v.off is the start offset measured from the end, so:
valueAddr = valueAreaEnd - v.off - v.len // ??? Need to verify
NOTE: Must verify the exact value offset convention against the Apple spec
during implementation. The key detail is whether v.off points to the first
byte or the last byte of the value relative to the reference point.
Files:
chown.go— public API:Chown(diskPath string, filePath string, uid, gid uint32) errorstructures.go— Go struct definitions for all on-disk typesfletcher64.go— checksum computationbtree.go— B-tree node reading and searchcontainer.go— container superblock + checkpoint scanningvolume.go— volume superblock + omap resolutionchown_test.go— tests (see below)
// Chown changes the owner and group of a file on an unmounted APFS disk image.
// The path is relative to the volume root (e.g., "Library/LaunchDaemons/foo.plist").
// The disk image must not be mounted.
func Chown(diskPath string, volumeName string, filePath string, uid, gid uint32) errorIn pkg/guestpatch/macos/macos_darwin.go, after writing the LaunchDaemon plist
via noowners mount, call:
apfs.Chown(disk, "Data", "Library/LaunchDaemons/io.lima-vm.lima-macos-init.plist", 0, 0)This replaces the entire privileged phase (sudo chown root:wheel).
- Unit tests for Fletcher-64 (use known test vectors from Apple spec or from a real APFS block).
- Integration test: Create a small APFS disk image using
diskutilon macOS, write a test file with known UID, run Chown, mount the image, verifystatshows the new UID/GID. - End-to-end:
limactl startcreates a macOS VM without sudo prompt.
- Value offset convention: The exact meaning of value offsets in B-tree nodes needs verification during implementation. If we get this wrong, we'll read garbage. Mitigated by: validating parsed inode fields (timestamps should be reasonable, mode should be a valid file mode, etc.).
- Hashed vs non-hashed directory records: macOS volumes may use either format
depending on case sensitivity settings. We need to handle both
j_drec_key_tandj_drec_hashed_key_t. - Multiple volumes: A macOS disk has System and Data volumes. We need to target the Data volume specifically (by name or role).
- Checkpoint descriptor area scanning: Block 0 may not be the latest superblock. Must scan the checkpoint area.