Hi,
A tweet can contain 140 UTF-16 characters.
An UTF-16 character can be composed of 2 16-bits surrogates.
A UTF-16 surrogate can be used to store 10 bits.
An ASCII character is 7 bits long.
So, a tweet can encode 140 x 2 x 10 = 2800 bits = 400 plain ASCII characters.
The challenge is to make an encoder (converting 400 - or more - ASCII chars in 140 UTF-16 chars) and a decoder (doing the opposite) that can both fit in a tweet.
NB: the encoder and decoder can be packed with this: https://gist.github.com/xem/7086007
NB2: non-printable characters 0x00 to 0x1F and 0x7F can be omitted.
Have fun!
Encodes 400 ASCII chars.
Encoder: 190 chars minified, 140 chars packed
e=function(e,c,b,d){d=b="";for(c in e)b+=(0+e.charCodeAt(c).toString(2)).slice(-7);for(c=0;b;b=b.slice(10))d+=String.fromCharCode((c++%2?56320:55296)+parseInt(b.substring(0,10),2));return d}
// or
eval(unescape(escape("𩐽𩡵𫡣𭁩𫱮𛁣𛁢𛁤𩀽𨠽𘠢𫱲𘁩𫠠𩐩𨠫🐨𩐮𨱨𨑲𠱯𩁥𠑴𭁯𤱴𬡩𫡧𫁩𨱥𫱲🐰👢𫁩𨱥𤱴𬡩𫡧𬡯𫑃𪁡𬡃𫱤𩐨𝠳𞠵𝐲𬁡𬡳𩑉𫡴𭑢𬱴𬡩𫡧𛀱𛀲𞱲𩑴𭑲𫠠𩁽").replace(/uD./g,'')))
Decoder: 159 chars minified, 124 chars packed
e=function(e,d,b,c){c=b="";for(d=0;400>d;)b=b.slice(7)+e.charCodeAt(d++).toString(2).slice(-10),c+=String.fromCharCode(parseInt(b.substring(0,7),2));return c}
// or
eval(unescape(escape("𩀽𩡵𫡣𭁩𫱮𛁤𛁢𛁣𨰽𨠽𘠢𫱲🐰🡤𨠽𨠮𬱬𪑣𩐨𪁡𬡃𫱤𩑁𭀨𩀫𫱓𭁲𪑮𩰨𫁩𨱥𨰫👓𭁲𪑮𩰮𩡲𫱭𠱨𨑲𠱯𩁥𨑲𬱥𢑮𭀨𨠮𬱵𨡳𭁲𪑮𩰨𛀲𞱲𩑴𭑲𫠠𨱽").replace(/uD./g,'')))
Demo and source code:
Hi :)
Indeed, the code can be smaller, I'm also working on it, and I'll post an update soon...
About the compression ratio, I consider there is currently no compression at all because each ASCII character uses 7 whole bits in the final tweet. I'm sure it could take less than that, by using some Huffman-like or Gzip-like compression algorithm. (I'm working on it too ^^)
And about the global scope pollution, well, all 140byt.es entries leak a function in the global scope, that's not a big deal. It's... mandatory. I just packed e() and d() so that they can both fit in 140 characters. That's not a problem.
If other global vars than d and e had leaked, THAT would have been a problem.