Skip to content

Instantly share code, notes, and snippets.

@subzey
Last active August 29, 2015 14:06
Show Gist options
  • Save subzey/b18c482922cd17693d65 to your computer and use it in GitHub Desktop.
Save subzey/b18c482922cd17693d65 to your computer and use it in GitHub Desktop.
Few Tricks on Compressing Js13k Entry

Few Tricks on Compressing Js13k Entry

Most important detail: in js13k entry 13k stands for zipped entry size. And this detail makes js13k differ from other code golfing compos.

Last year winner entry uncompressed size was about 70k. 70 kilobytes! In js1k you have a month to minify 1k, so how much time would it take to minify 70k? Nope, you have only 1 month. So it's very unlikely one will manually golf the entry. And this is the other thing that makes js13k special: no hardcore manual "hell yeah I've stripped one byte!" golfing.

So, huh, rules are set. And in order to have advantage in this compo you must know how ZIP works.

How ZIP Works

First of all, zip is not a compression algorithm itself, it's rather a container. Zip can contain entries compressed in 13 different ways including Deflate, Bzip2, LZMA and no compression at all ("raw"). What can we use in practice? Only Deflate and Raw.

There was a temptation to use Bzip2, as it offers better compression rate than Deflate. But guess what - it won't work. I did a test: Only WinZip could unpack file compressed with bzip2. WinRar and Windows Explorer told me that this file cannot be extracted. Linux zip version (that most probably will be set on js13k servers) honestly reported that bzip2 compression is not supported (despite the fact bzip2 package was installed!). MacOS did better than others: if archive contains a bizpped file, it silently fails and don't unpack anything at all.

So, again, we stick to Deflate and Raw. Actually, Raw may be the option. Some PNG images after compression can even increase in size! (Why? That's because PNG is already Deflated)

Tip #0: Check if you need to compress an asset. Maybe you don't.

But let's focus on Deflate.

How Deflate Works

In a nutshell: backreferences and Huffman trees.

Huffman Trees

You probably worked with Huffman trees before or heard of it in IT science lectures. It's quite simple: bytes that appears more frequently are encoded in fewer number of bits. Bits are written sequentially, without byte boundaries.

As an example: Imagine we have the following string: AAAABCAAAAB. Let "A" be encoded by one 0 bit, "B" is 10 and "C" is 11. So, the string above transforms into 00001011000010. That's 14 bits or slightly less than 2 bytes.

Deflate has its predefined Huffman tree, but it's not too optimized. Almost always custom trees are calculated. Tree should not contain each and every byte value in 0..255 value. In example above, tree contains only 3 entries and max bit length is 2. If Huffman tree must contain more values, the compression would be worse.

Tip #1: Less distinct character you have in your code, better it will be compressed. Lowercase your HTML, use only double or only single quotes - do everything to reduce set of values!

Backreferences

Deflate is not a dictionary algo. It uses bytes that it already decoded as dictionary instead. It has special commands that means something like "go back N bytes and copy M bytes".

Same example, another approach: AAAABCAAAAB might be encoded as A<3 bytes from offset -1>BC<5 bytes from offset -6>

Tip #2: Repeat yourself! Long runs of duplicated code may consume just a few bits!

There's an important detail: Deflate in ZIP cannot address backreference more than 32Kib back (a size of "sliding window"). So if you have two identical pieces of code with 32768 bytes between them, backreference won't work. And also shorter the "jump distance" back, shorter its command code length in bits.

Tip #3: Be careful while repeating yourself! Place duplicated chunks near each other.

Overheads

Besides raw compressed data ZIP archive also contains a file header, the place where its filename, size and other stuff is stored. And guess what. For each file in he archive ZIP has two (or more) such headers! One before the compressed data and other at the end of file in Central Directory that lists all the files.

Moreover, archive may contain several central directories, but only the last one is read. This may happen if file is appended to the archive (without clean recreation of the zip file from the scratch), be warned!

Overhead for each file is 74 + (2 × filename length) bytes. And these bytes cannot be compressed.

Tip #4: Keep file names short and avoid storing many small files: it will eat up all the space.

(Btw, don't try to mess with these headers. One wrong move, one deleted field, and everything is screwed up. Trust me, I've tried it.)

What You Should Not Do

Packing your JavaScript with packers which eliminates duplicated strings like p,a,c,k,e,d or RegPack is a bad idea. It works for js1k, but it won't work for js13k, you'll just get an overhead. The way Deflate deals with duplicates works as charm and just let it do its job.

Another thing you should not do is a JavaScript-inside-PNG stuff like JsEXE. JsEXE benefits from PNG's Deflate packing and it's a neat thing for DemoJS. But in js13k you anyway use Deflate, so why apply it twice.

Moreover, JsEXE is forbidden by rules.

After all, if deflated file can be deflated once again, it means that it was poorly deflated. Use better compression settings in your archiver (these often has non-optimal defaults!).

Tip #5: Know how your tools work! What was a Silver Bullet for js1k now can be just an extra overhead.

Note that "regular" js minifiers like YUI Compressor, Closure Compiler or UglifyJS, that optimizes names and structure (rather than treating js code as a long string for eval) are in and gives good results.

Conclusion

That's it.

I won't give you links to the ready-to-go recipes. You now know the principles. You are the Golfer, you are the Hacker, you know how to use it. :D

P.S: Good luck to all js13k 2014 participants! I'm looking forward your awesome games!

@dgoemans
Copy link

dgoemans commented Sep 4, 2014

The one you didn't cover was LZMA. Why not use LZMA? I'm getting slightly better compression from that, and at least windows seems to handle it ok...

@subzey
Copy link
Author

subzey commented Sep 16, 2014

@dgoemans, that's a good point, thanks! I was so frustrated by awful zip/bzip2 support, that I didn't even test it. For test file (jquery-2.1.1.min.js) lzma compresses worse than bzip2, but it still outdoes deflate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment