Skip to content

Instantly share code, notes, and snippets.

@pudquick
Last active August 7, 2022 19:49
Show Gist options
  • Save pudquick/29fcfe09c326a9b96cf5 to your computer and use it in GitHub Desktop.
Save pudquick/29fcfe09c326a9b96cf5 to your computer and use it in GitHub Desktop.

pbzx streams explained (for real, Yoyo-style):

There are some interesting write-ups online regarding how to work with pbzx-encoded streams and their history:

Unfortunately - the structure that's described in these documents doesn't match what's to be found in the Payload within the xar 'Essentials.pkg' within the Yosemite installer.

You will run into an error using any of the above tools / format descriptions after about 1GB of data.

A 'xz chunk' will complete - and then the next chunk will offer headers indicating a new 16MB (exactly 16777216 bytes) chunk - yet the new chunk will not have any magic/indicators of it being xz compressed.

The true structure of this pbzx stream turns out to be a little more complicated:

  • Overall, the entire stream (when properly decoded) is a .cpio file
  • The .cpio file is chunked into two different kind of chunks
    • Those starting with the 'xz' magic, indicating a 'xz' compressed section
    • Those not starting with the 'xz' magic, apparenly always 16MB in size, indicating a raw/uncompressed chunk of the overall .cpio file

For the Payload of 'Essentials.pkg', there are overall 11 total chunks, alternating between 'xz' and uncompressed.

Reassembly of the file is performed by decompressing each of the 'xz' encoded chunks individually, then concatenating all (11, in this example) chunks in order now that they're all decompressed - to recreate a single .cpio file.

For a python chunk decoder implementation, look here: https://gist.github.com/pudquick/ff412bcb29c9c1fa4b8d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment