Last active
April 15, 2021 19:30
-
-
Save iamgreaser/b1ebe6debc439b45c5fba074c3e34052 to your computer and use it in GitHub Desktop.
THUG2 LZSS compression scheme (as used by the *.prx files)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
THUG2 LZSS compression scheme (as used by the *.prx files) | |
Documented by GreaseMonkey in 2017 | |
Document version V1 | |
I release this document into the public domain. | |
AWWW YEAAAAH! Datz RIGHT b0!Z! We got a ... yeah whatever I'm not doing the | |
ASCII art required for that kind of introduction. | |
Well, they could've packed it a bit better, but hey, it took 50 minutes to | |
crack so I'm not complaining, and it is at least a decent compression scheme. | |
On the other hand, zlib is a lot better, and has a licence which makes the MIT | |
licence look restrictive. | |
As for the actual PRX structure, files and filenames padded to the nearest 4 | |
byte boundary, and the rest is pretty easy to work out - if not... OK, the | |
XeNTaX wiki pretty much lies, but it IS an IHIH main header / IIII file header | |
structure. | |
Data is written to a 4KB ring buffer as it gets decompressed. | |
LZSS data is stored as offset, (length-3). | |
This means that all LZSS runs are at least 3 bytes. | |
Offsets are absolute indices into the ring buffer. | |
The ring buffer starts decoding at index 0xFEE. Don't ask me why. It just does. | |
Main decode loop is as follows: | |
1. Read a byte. These are your type bits. | |
2. If the bottom bit of the type bits is 1: | |
A. If the file pointer is >= the compressed file length, END RIGHT HERE. | |
B. Read a byte. | |
C. Output that byte and store it into the ring buffer. | |
3. Otherwise if it's 0: | |
A. Read a byte. Call this b0. These are the lower bits of the offset. | |
B. Read another byte. Call this b1. | |
C. Take the top 4 bits of b1. These are the upper bits of the offset. | |
D. Take the bottom 4 bits of b1. Add 3. This defines the length. | |
E. For `length` bytes: | |
a. Get the byte at `offset` in the ring buffer. | |
b. Output that byte and store it into the ring buffer. | |
c. Add 1 to `offset` modulo 4096 (0x1000, or just AND with 0xFFF). | |
4. Shift the type bits right by one. | |
5. If you have any type bits left, go to 2. Otherwise, go to 1. | |
None of this was ripped from any actual Tony Hawk engine code, compiled or | |
source, so you are free to use this for whatever. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment