-
-
Save mathiasbynens/1243213 to your computer and use it in GitHub Desktop.
// Ever needed to escape '\n' as '\\n'? This function does that for any character, | |
// using hex and/or Unicode escape sequences (whichever are shortest). | |
// Demo: http://mothereff.in/js-escapes | |
function unicodeEscape(str) { | |
return str.replace(/[\s\S]/g, function(character) { | |
var escape = character.charCodeAt().toString(16), | |
longhand = escape.length > 2; | |
return '\\' + (longhand ? 'u' : 'x') + ('0000' + escape).slice(longhand ? -4 : -2); | |
}); | |
} |
@mathiasbynens Yes, it's best to avoid octal escape sequences...the OctalEscapeSequence
production is deprecated in ES 5, and produces a syntax error in strict mode:
A conforming implementation, when processing strict mode code (see 10.1.1), may not extend the syntax of
EscapeSequence
to includeOctalEscapeSequence
as described in B.1.2. —Annex C
I'm throwing this up here hoping to help somebody else down the road.
I had to restore partial keys from a Redis dump, and this function almost helped. Here is what I came up with.
Make sure to create the redis client with like this:
var client = redis.createClient(global['redis_port'], global['redis_host'], { return_buffers: true });
var fs = require('fs');
var redis = require('../lib/redis.js');
function e(buf) {
var res = '';
for (var i = 0; i < Buffer.byteLength(buf); ++i) {
var c = buf[i].toString(16);
if (c.length == 1) {
c = '0' + c;
}
res += '\\x' + c;
}
return res;
}
function generate_dump() {
var keys = fs.readFileSync('keys.txt').toString().split('\n');
return keys.reduce(function (prev, key) {
return prev.then(function () {
return redis.dump(key)
.then(function (res) {
if (!res) {
console.log('missing key', key);
return;
}
fs.appendFileSync('dump.txt', 'RESTORE ' + key + ' 0 "' + e(res) + '"\n');
});
});
}, Promise.resolve());
}
redis.init()
.then(function () {
return generate_dump();
})
.then(function () {
console.log('done');
})
.catch(function (err) {
console.log(err['stack']);
});
If the goal is to do this with minimal code size, the following works well and minifies to ~100 bytes:
function escapeUnicode(str) {
return str.replace(/[^\0-~]/g, function(ch) {
return "\\u" + ("000" + ch.charCodeAt().toString(16)).slice(-4);
});
}
Fantastic! Thanks for this @mathiasbynens!
Replace only unicode characters
function escapeUnicode(str) {
return str.replace(/[\u00A0-\uffff]/gu, function (c) {
return "\\u" + ("000" + c.charCodeAt().toString(16)).slice(-4)
});
}
I use this for convert utf8
content of js files to latin1
Very interesting work guys, thanks for sharing.
@mervick was especially useful for my use case, any restriction to use it? Thanks!
@rafaelvanat I used that in my project more then year, and so far there have been no problems
@mervick @rafaelvanat If I use that function like this:
escapeUnicode("abc𝔸𝔹ℂ")
Then I get:
abc𝔸𝔹\u2102
The following function fixes this by matching all non-ASCII characters after splitting the string in a "unicode-safe" way (using [...str]
). It then splits each Unicode character up into its code-points, and gets the escape code for each (rather than just grabbing the first char code of each Unicode character):
function escapeUnicode(str) {
return [...str].map(c => /^[\x00-\x7F]$/.test(c) ? c : c.split("").map(a => "\\u" + a.charCodeAt().toString(16).padStart(4, "0")).join("")).join("");
}
This gives the correct result:
abc\ud835\udd38\ud835\udd39\u2102
This seems to work fine in all my tests so far, but if I find any bugs I'll add fixes in this gist. Performance doesn't matter for my use-case, so I haven't benchmarked or optimised it at all.
Check out jsesc
which solves this problem in a more robust manner.
@mathiasbynens It looks great! I did try to use it but unfortunately I'm not up to date with all the browserify/bundling stuff and just need a vanilla JS script (e.g. no use of Buffer
) to include in a module import and wasn't able to work out how to do that with jsesc
(though I admit I only poked around for a few minutes before deciding to write the function above). Also, out of pure curiosity I'd be interested in cases where the above function fails - I couldn't find any failing cases in my tests.
@josephrocca See https://github.com/mathiasbynens/jsesc#support. TL:DR use v1.3.0.
Okay, so we use Unicode escapes (e.g.
\u1234
) and hexadecimal escapes (e.g.\x12
)… What about octal escapes (e.g.\123
)?I quickly tested this in Node.js:
Octal escapes can only be used for charCodes smaller than
256
, and the test results show that they’re only shorter than Unicode/hex escapes for charCodes < 64:Of course, it’s problematic if you have e.g. '\0' immediately followed by another digit, e.g.
1
, as it will alter the escape rather than append a new character:Update: We probably shouldn’t use them: