Ascii85, also known as base85, is an encoding similar in concept to base64. Where base64 uses four ASCII characters to represent three source bytes (thereby inflating the data size by 33%), ascii85 uses five ASCII characters to represent four source bytes (thereby inflating the data size by 25%).
This script can be used to encode and decode a DataView
as ascii85. Spawn an Ascii85Codec
with the desired configuration, and then call its member functions as appropriate. In particular, this class should be appropriate for storing a DataView
inside of Local Storage with the smallest possible size (if you use the STORAGE_CHARSET
offered).
Note that we do not perform any error-checking during the decode step. Ascii85Codec
offers a validate
member function that can be run on an encoded string to verify that it contains no illegal characters; this can be run prior to decoding in any situation where the input data is untrusted.
-
The
STORAGE_CHARSET
is offered as a convenience, to aid with JavaScript that needs to store binary data via the Local Storage API with minimal overhead. Browsers store all such data as a JSON string, so the string representation of the data is what counts toward storage size limits. Chromium uses a null-terminated JSON string (and counts the terminating null), and within stored string values, Chromium escapes double-quotes and and left angle brackets.STORAGE_CHARSET
substitutes"
,<
and\
out in order to avoid the additional overhead of string escape sequences in the serialized JSON (as these would be represented in storage as\"
,\u003C
, and\\
). -
In the encoding step, we pre-create
chars
as an array of five string values and then replace individual elements. This ensures that the array is packed and initially allocated with the desired size. -
Validation of an encoded string works by pre-compiling a regular expression to test the input. We here assume that native code will do the job faster than we would. To keep as much complexity out of the regex as possible, empty strings are treated as a special case and not handled by the regex.
-
Decoding always allocates an
ArrayBuffer
with a length that is a multiple of 8, and we just return a truncatedDataView
into that buffer. This allows us to write the final chunk with a singlesetUint32
call, instead of having to rebuild a padded DWORD and then decompose it into bytes by hand. I am here (micro)optimizing for speed over space, wasting no more than three bytes of memory per operation. -
Decoding uses
String.prototype.indexOf
to count the number of occurrences of the two abbreviated token types (z
for0x00000000
andy
for0x20202020
). This requires us to crawl the input string twice, but should still be faster than manually looping over the characters just once, as we can take advantage of optimized native code (which I presume will use things like SIMD, possibly within standard library functions likememchr
, to very rapidly scan the string for a single character).