Skip to content

Instantly share code, notes, and snippets.

@deckarep
Created June 28, 2024 17:34
Show Gist options
  • Save deckarep/d6e0f0884a22c8a4a4b9392155f9dad0 to your computer and use it in GitHub Desktop.
Save deckarep/d6e0f0884a22c8a4a4b9392155f9dad0 to your computer and use it in GitHub Desktop.
Extracts all .wav audio from Leisure Suit Larry Casino - originally written by @Doomlazer
# Original credit: @doomlazer
# https://github.com/Doomlazer/TrivialQuest/blob/main/audio/jokes/ExtractWavs.py
# Extract wav files from Larry's Casino audio.vol
# Sound effects can also be dumped from resources.vol
# jokes are 71.wav - 183.wav
#
# command to convert wav to mp3:
# for f in *.wav; do ffmpeg -i "$f" "${f%.wav}.mp3"; done
import struct
AUDIO_DEST_FOLDER = "extracted_audio/"
def dump_audio(from_file):
fnum = 0
with open(from_file, "rb") as f:
while (byte := f.read(1)):
if byte == b'\x52' and f.read(1) == b'\x49' and f.read(1) == b'\x46' and f.read(1) == b'\x46':
print("Found RIFF starting at: ", f.tell()-4)
size = struct.unpack('<i', f.read(4))[0]
print("wav size: ", size)
f.seek(-8, 1)
wav = f.read(size+8)
s = AUDIO_DEST_FOLDER + str(fnum) + "_" + from_file + ".wav"
fnum=fnum+1
nf = open(s, 'bw+')
nf.write(wav)
nf.close()
# Extract music + sfx.
dump_audio("resource.vol")
# Extract cast voice audio.
dump_audio("audio.vol")
@Doomlazer
Copy link

eyebrows

Take a look at this image. The red rectangles mark what should be a single line containing two eyebrows. The red arrow point to the hair on either side. Notice that the first eyebrow should be black, but it looks like there is some compression happening. I'm guessing my theory that only transparency is compressed is incorrect. Also notice that the second eyebrow contains 5 consecutive '255' pixels in a row. RLE should have compressed that.

I need to take a look at how SCI1 and SCI1.1 views are packed and see if there are any hints there. Unfortunately I'm starting a new job tomorrow, so it's a bit difficult to focus at the moment. I'll probably take a few days to get situated then have another crack at this.

@deckarep
Copy link
Author

All good @Doomlazer - I appreciate you looking at this...you've certainly made more progress than I have but I also know that this is a really time consuming prospect to get working. I've been trying to get the game running on a modern Windows machine, even going as far as binary patching the original executable because it gates startup saying it only wants to run in 16-color mode even though I have Windows 95/98 Compatibility enabled + 16-color reduced mode.

Good news is I got the game to run, but it locks up after the intro animation. If I can get 100% accurate screen shots of some of the characters, then we can count the pixel colors and determine the compression counts per pixel.

Anyways, I know how starting a new job goes...it can be stressful to ramp up. Thanks for all your efforts thus far and feel free to work on this as much or as little as you need.

But again, with no pressure if you find anything please ping here!

@deckarep
Copy link
Author

Also your theory about only the transparency color being compressed is interesting indeed. If that's the case, then we potentially have all the pixel information except transparency which is causing misalignment. I'm going to see if I can determine if that is true.

@Doomlazer
Copy link

That's cool you made some progress with the opening intro at least. It would be crazy to get it full working!

I now think more than just the transparency is being compressed, but it's hard to tell because image 463 uses a lot of very similar colors for skin tone and black, so there aren't huge runs where you'd expect them to be. I just added a function to count repeats and, having only tested 463, the max number of consecutive colors it found was a run of 6. At some point I'll run it against the full resource.vol and see of that holds true.

I think you're right, this is something that could take a lot of time to unwind. Best not to pressure ourselves to get it solved right away. Slow and steady wins the race!

@Doomlazer
Copy link

I decided to look up the SC1 RLE and it looks promising:

// SCI1 Run Lenght Encoding
// taken from https://github.com/scummvm/scummvm/blob/master/engines/sci/graphics/view.cpp
// Each byte is like XXYYYYYY (YYYYY: 0 - 63)
// - Case A: XX == 00 (binary)
// Copy next YYYYYY bytes as-is
// - Case B: XX == 01 (binary)
// Same as above, copy YYYYYY + 64 bytes as-is
// - Case C: XX == 10 (binary)
// Set the next YYYYY pixels to the next byte value
// - Case D: XX == 11 (binary)
// Skip the next YYYYY pixels (i.e. transparency)

I had forgotten that the first two bits control the "mode" and the last 6 bytes give the repeat value for the following byte or tell you how many uncompressed palette colors to read consecutively.

I'm going to take a break, but hopefully LC uses this same scheme.

@deckarep
Copy link
Author

deckarep commented Jul 1, 2024

Screenshot 2024-06-30 at 6 04 14 PM

So, this is probably my last update for today at least: I made a program that allows me to load in the Peter texture and I can rotate the pixel data left or right per row. It's still tedious as hell to get the image to look half way decent via the rotation technique. But I'm pretty much sure that we are missing compression for other colors now.

For example, I'm seeing his lip color repeated on the same row...same with his glasses and eyeballs which I believe means that we're lacking the compressed color runs to push pixel data out to whatever appropriate lines they really should be on.

With regards to your comment about SCI1 RLE - I'm inclined to say that the compression/encoding can't be too far off of something like this or perhaps the StacPac encoding of Larry7. Since we're kind of seeing image data to a degree we can't be too far off. However, finding the precise encoding is the challenge.

One more thing: In the garbled image above...I think wherever we're seeing the pattern of white/green/light green pixels is where the compression runs need to go. Perhaps I'm just stating the obvious. :)

Anyways, solid work and I hope your new job starts off great!

@Doomlazer
Copy link

I searched this morning for some info on StacPac, but got no relevant results. I had forgotten you mentioned finding it the ScummVM code. Based on the release dates, StacPac is probably more likely than the SCI1 RLE.

I'm also going to call it a day or I'll likely end up digging through the SVM LSL7 view code until 3am - which would not be a good start for tomorrow.

I agree we're getting close on this. Take care and have a good night!

@deckarep
Copy link
Author

deckarep commented Jul 1, 2024

I'm just going to drop some end of day notes here:

This is expounding on your original observations: After the palette there seems to be 2 values that are unknown followed by WIDTH then HEIGHT. I think those 2 could mean "loop" (first one) then "cell count" (second one).

The reason why I suspect this is even though for example the peter texture has a width of: 103x196, it seems there's actually more data with his face continuing through the bytes. This would make sense possibly because somewhere in the game there has to be all of his variations that have his different animated states...it makes sense that multiple animated "cells" would be packed together within one single texture.

Lastly, when I look at the bytes delta between one texture to the next...they are always either close to the (W*H) * CELLCOUNT or less. The smiley faces for example have a lot of gradients, so they don't have great compression.

The background images which all seem to be the first set of textures (beginning of file) have about 20% compression ratio. Also these unknown values (suspected loop, cell_count) for backgrounds always seem to be 1 and 1 for the background images.

I think the loop is a single byte value while cell_count is an unsigned short.

@deckarep
Copy link
Author

deckarep commented Jul 5, 2024

Happy 4th @Doomlazer, I hope the new job is going well and what a great way to start but with a little holiday break right off the bat.

I have not given up on this effort...I told you I was able to do some binary patching and now I have the app running on my Macbook but it's not without problems. It runs with the wrong colors and with this weird mirroring issue so it's largely unplayable.

I was however to do a heap dump of the process and actually see graphics...this is good in the sense that I can get at some of the raw pixel artwork...but bad in the sense that I still can't find the sprites and the heap dump is about 4gigs of raw data.

Anyways, I'm going to resort to running it in a true VM, and see if the heap dump is more straightforward to comb through. I feel like this might be easier rather than reversing the binary file format...

I wish I was better at reverse engineering binary file formats but I suck at it.

@Doomlazer
Copy link

That seems like a smart strategy... and a ton of data to comb through!

I haven't given up on cracking the compression format. Not that I'll be able to do it, but I think it's still worth looking at.

The new job is fine, but I'm pretty much exhausted, even with the holiday yesterday. I'll likely "decompress" tomorrow, but hopefully I have enough brain power Sunday to follow up on some things I left unfinished last week. If you can dump any of the images, please let me know. It would likely be helpful to know the exact sequence the decompressed data should match.

@deckarep
Copy link
Author

deckarep commented Jul 6, 2024

LSLCasinoPeterStudy
Screenshot 2024-07-05 at 10 39 34 PM

Starting any job can be stressful so I completely get where you're coming from. Remember no rush on any of this...

I have discovered a pattern with compression. So I was able to get the game running in a Windows 98 virtual machine on my Mac. Which was a pain in itself...but important to get pixel accurate screenshots.

The above screenshot is basically the Peter characters bounding box of 103x196 specified as a dimension of 2 ushorts respectively in the data.

If you notice the red tinted area of his bounding box, on his corners is where we should see color runs of that GREEN key color to represent his transparency.

Looking at my hex editor a pattern emerges of 3 bytes: 00, 01, , so starting in the green area where his image data starts:

We have:

00, 01, 35 - (53 in decimal)
00, 01, 31 - (49 in decimal)
00, 01, 2F - (47 in decimal)
00, 01, 2D, - (you get the idea)
00, 01, 2B,
... so on and so on.

So at least for the initial compression run this indeed pixel accurately follows his hairline going down the left side. Again the pattern being: 00, 01, :count of run:.

I also remember us discussing that his pixel data may have compression on his skin tones and I have not yet accounted for that...I also have not yet accounted for the pattern to cover the key (transparency) color on the right side but this feels like a good start to understanding the compression holistically.

Another observation: even though his portrait is 103x196, I am quite sure that the image data extends well beyond a single frame. This is why I think 2 of those numbers represent a row/column pair. Peter's has a row: 1, and column of 20. So I believe to decode peter fully, we need to consume about 20 columns of Peter to get the entire single texture (basically a texture atlas - think of variations of Peter stacked 20 times). In theory this should cover most of his animations and possibly face states (unless they're in another texture)

@deckarep
Copy link
Author

deckarep commented Jul 6, 2024

Here's a zoom in of how I have been counting his Transparency runs:

Screenshot 2024-07-05 at 10 55 03 PM

@Doomlazer
Copy link

I made a little progress, but not much. Looking at STACpack in scummvm, I'm not use its what is being used. https://github.com/scummvm/scummvm/blob/5924b3810b5c859d1df1c469cb95a6690d7be14d/engines/sci/resource/decompressor.cpp#L640 STACpack seems to check the first bit of a sequence to see if it's compressed or a literal byte. If compressed, the next 7 or 11 bits should point to a dictionary at the end of the data - But the data doesn't seem to match that at all IMO.

Your observation that 0x0001 followed by a repeat run length byte seems to hold.

0x0002 followed by a run length byte of 'as-is' byte data might be the case.

Sequences starting with 0x01 and 0x02 also might indicate special modes, but I"m not sure. I just can't wrap my head around it all yet. Not sure what I'm missing.

I also, think you are correct about the '20' columns in tex 463. It would explain why the face seems to repeat if I output too many pixels.

@deckarep
Copy link
Author

deckarep commented Jul 9, 2024

I agree, I don't believe this data looks like StacPac data. StacPac apparently consumes the data as a bit-stream, so I would not expect the hex data color index values and the run counts to be the numbers they are. Even though we have not cracked the pattern yet...it definitely points more likely to some type of RLE type of data....albeit a custom variant of it.

What's frustrating to me...is that some of the pattern of the hex makes sense while other parts completely throw things off of my analysis. Very frustrating.

@Doomlazer
Copy link

Were you able to get the uncompressed stream from the dumps? If I had the complete color sequence for the first two rows of that Pursuer image I think it would answer at least some of the questions I have about the format.

This is going to be another exhausting week of work, so it might not be until the weekend when a get to dig into this again.

@deckarep
Copy link
Author

deckarep commented Jul 9, 2024

I’ll should be able to get you both a screenshot and raw data. I’ll work on it.

@Doomlazer
Copy link

Some more stuff to add to our notes:

During my lunch break I was digging for more info and the design doc on laffer.net implies that the game reuses code from the original version of Hoyle's Casino. http://www.larrylaffer.net/images/Larry's%20Casino%20Design.PDF

I was hoping this might lead to more info about the engine or resource formats used, but there seems to be even less about HC - perhaps because there were like eight different HC games.

I also looked into DirectX Texture compression (specifically DXT3 and 5), but those use compressed 4x4 blocks of pixels (I think all DXT versions use block compression). That doesn't work with the run of 53 transparent pixels you identified at the start of that 463 image that begins with 0X0001.

I also can't find anything online about a texture format that begins with the "TEX 0001" sequence. Guess that reinforces the idea that this is a custom format - now seemingly first used in Hoyle's Casino?

@deckarep
Copy link
Author

Hmm, if this code was based off of one of the Hoyle games…then I would expect the RLE compression to exist within ScummVM. This could be the case as I have not yet thoroughly read through all the different variations of the decompressors.

I also was reading on a possibility of a block based compression like the DirectX stuff. The game engine Godot has a pretty comprehensive decompressor that seems to handle many flavors of the Windows DirectDraw decompressors but all of them look for a magic header that we don’t have here.

there’s a possibility that when a “tex 0001” format is loaded in Casino it’s partially rewritten and then handed off directly to DirectDraw/DirectX code for ultimately uncompressing. So I don’t think we should rule this out just yet.

There is a DTX format that is a 16-bit color space format which could be close.

@deckarep
Copy link
Author

deckarep commented Jul 18, 2024

After figuring out how to do some tracing with Wine (I'm not sure if you're familiar with the Wine project but basically it allows you to run older Windows applications on Mac or Linux as if they were native applications). Not everything works well with Wine, and I got the game to run but it doesn't render correctly.

Anyways, I feel like I'm a step closer, here is some interesting tracing from the Wine logs while the game is running:

0024:trace:bitmap:NtGdiCreateBitmap 16x16, bpp 32 planes 1: returning 0x4090069
0024:trace:bitmap:NtGdiCreateBitmap 16x16, bpp 1 planes 1: returning 0x4090068
0024:trace:bitmap:NtGdiCreateDIBSection format (16,-16), planes 1, bpp 32, BI_RGB, size 1024 RGB
0024:trace:bitmap:NtGdiCreateBitmap 32x32, bpp 1 planes 1: returning 0x409006a
0024:trace:bitmap:NtGdiCreateCompatibleBitmap (0x101003c,32,32)
0024:trace:bitmap:NtGdiCreateBitmap 32x32, bpp 32 planes 1: returning 0x209006c
0024:trace:bitmap:nulldrv_StretchDIBits 0 0 32 32 <- 0 0 32 32 rop 00cc0020
0024:trace:bitmap:NtGdiCreateDIBSection format (32,-32), planes 1, bpp 32, BI_RGB, size 4096 RGB
0024:trace:bitmap:nulldrv_StretchDIBits 0 0 32 32 <- 0 0 32 32 rop 00cc0020
0024:trace:bitmap:nulldrv_StretchDIBits 0 0 32 32 <- 0 0 32 32 rop 00cc0020
0024:trace:bitmap:NtGdiCreateCompatibleBitmap (0x27010066,1,1)
0024:trace:bitmap:NtGdiCreateBitmap 1x1, bpp 32 planes 1: returning 0xc09006b
0024:trace:bitmap:NtGdiCreateCompatibleBitmap (0x3010071,1,1)
0024:trace:bitmap:NtGdiCreateBitmap 1x1, bpp 32 planes 1: returning 0x1090073
0024:trace:bitmap:NtGdiCreateCompatibleBitmap (0x1601005d,1,1)
0024:trace:bitmap:NtGdiCreateBitmap 1x1, bpp 32 planes 1: returning 0x1090075
0024:trace:bitmap:NtGdiCreateCompatibleBitmap (0x5010078,1,1)
0024:trace:bitmap:NtGdiCreateBitmap 1x1, bpp 32 planes 1: returning 0x2090079
0024:trace:bitmap:NtGdiCreateCompatibleBitmap (0x6010072,1,1)
0024:trace:bitmap:NtGdiCreateBitmap 1x1, bpp 32 planes 1: returning 0x109007b
0024:trace:bitmap:NtGdiCreateCompatibleBitmap (0x501007e,1,1)
0024:trace:bitmap:NtGdiCreateBitmap 1x1, bpp 32 planes 1: returning 0x209007f
0024:trace:bitmap:NtGdiCreateCompatibleBitmap (0x8010077,1,1)
0024:trace:bitmap:NtGdiCreateBitmap 1x1, bpp 32 planes 1: returning 0x1090081
0024:trace:bitmap:NtGdiCreateCompatibleBitmap (0xe41007d,32,32)
0024:trace:bitmap:NtGdiCreateBitmap 32x32, bpp 32 planes 1: returning 0x5090084
0024:trace:bitmap:NtGdiCreateCompatibleBitmap (0xf41007d,32,32)
0024:trace:bitmap:NtGdiCreateBitmap 32x32, bpp 1 planes 1: returning 0x1090085
0024:trace:bitmap:NtGdiCreateCompatibleBitmap (0x18010044,1,1)
0024:trace:bitmap:NtGdiCreateBitmap 1x1, bpp 32 planes 1: returning 0x909004c
0024:trace:bitmap:NtGdiCreateDIBSection format (640,480), planes 1, bpp 32, BI_BITFIELDS, size 1228800 RGB
0024:trace:bitmap:NtGdiCreateCompatibleBitmap (0x14410070,32,64)
0024:trace:bitmap:NtGdiCreateBitmap 32x64, bpp 1 planes 1: returning 0x6090083

Relevant code that loads these images is here:

https://github.com/wine-mirror/wine/blob/88a28aa5757ae74d9997b470d70216f10974247f/dlls/win32u/dib.c

and here:

https://github.com/wine-mirror/wine/blob/88a28aa5757ae74d9997b470d70216f10974247f/dlls/win32u/bitmap.c

More specifically check out this RLE encoding routine:

https://github.com/wine-mirror/wine/blob/88a28aa5757ae74d9997b470d70216f10974247f/dlls/win32u/dib.c#L333

@Doomlazer
Copy link

Doomlazer commented Jul 19, 2024

Well, that certainly gives me a lot to investigate. I never really learned C++ so it's difficult for me parse. I can usually do it though if I study the code slowly. The code mentions RLE4, so I might start by tracking down some info on BMP structures and compression.

Unfortunately, work has consumed my weekdays. I'm hoping I'm not completely exhausted again this weekend so I can follow up on these leads. Saturday will likely be a recovery day, but I'm hopeful to find some time Sunday. I'd really like to solve this if possible considering all the time we've put into it.

@deckarep
Copy link
Author

All good @Doomlazer - I'm updating these notes so I don't lose track of my findings and for you to also know when you feel up to the task but please tackle this at your discretion.

@deckarep
Copy link
Author

This has admittedly turned into a full blown obsession for me...still have stuff to work out that I don't fully understand but i've managed to get Peter looking a lot more like Peter. 😁

0

@Doomlazer
Copy link

Damn, that is nearly there! Keep going!

@deckarep
Copy link
Author

Thank you for the encouragement, I think i'm almost there...this is insane as I've never cracked something like this before...there's no magic to it other than me tediously counting pixels, offsets and trying to figure out these damn patterns. Another thing I've been doing is reading all the variations of RLE encoding in the wild but so far nothing really seems to match up with how this game does it.

I wish there was a better way.

0

@Doomlazer
Copy link

AFAIK, that is the way all reverse engineering is done. It's a combination of research and mostly trial and error based on hunches. If it were easy, it would have already been done.

It's extremely satisfying once you crack it though. It's like finding the WOPR backdoor, only 100x better because you didn't just guess a password. Based on that last pic you are very close, though there might be more to learn about the other images in the resource.

I have a feeling you'll get it this weekend, which is very exciting!

@deckarep
Copy link
Author

Time for a celebration: 🎊🔥⚡️🍾🚀

Still left to do:

  • Try other sprites and hopefully eliminate any further edge cases
  • Work on the sub-sprites (rows/columns theory) to get them to render out
  • Upload to Github this code so it can be improved and shared
  • Primary goal is to extract the character sprites during LC gameplay
  • Secondary goals would be get all room backgrounds and GUI sprites, but I don't really need this for my poker game

LarryCasinoSpritesReversedBanner

I credited you because you helped get the scripts started and you were instrumental in bouncing ideas off of, so a big thank you @Doomlazer!

@Doomlazer
Copy link

Doomlazer commented Jul 20, 2024

Grats, that's awesome! I'm excited to see the RLE algorithm.

You have to list yourself first in the credits for finally breaking it.

@deckarep
Copy link
Author

Thanks, here's the not-so-secret-sauce: https://github.com/deckarep/laffer-casino-extractor

@deckarep
Copy link
Author

@Doomlazer - would you be okay with also getting your .wav/jokes extractor in this repo for completion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment