Skip to content

Instantly share code, notes, and snippets.

@slevithan
Created May 7, 2012 20:54
Show Gist options
  • Save slevithan/2630353 to your computer and use it in GitHub Desktop.
Save slevithan/2630353 to your computer and use it in GitHub Desktop.
Full 21-bit Unicode code point matching in XRegExp with \u{10FFFF}
// Allow syntax extensions
XRegExp.install("extensibility");
/* Adds Unicode code point syntax to XRegExp: \u{n..}
* `n..` is any 1-6 digit hexadecimal number from 0-10FFFF. Comes from ES6 proposals. Code points
* above U+FFFF are converted to surrogate pairs, so e.g. `\u{20B20}` is simply an alternate syntax
* for `\uD842\uDF20`. This can lead to broken behavior if you follow a `\u{n..}` token that
* references a code point above U+FFFF with a quantifier, or if you use the same in a character
* class. Using `\u{n..}` with code points above U+FFFF is therefore not recommended, unless you
* know exactly what you're doing. XRegExp's handling follows ES6 proposals for `\u{n..}`, since
* compatibility concerns prevent JavaScript regexes from changing to be based on code points
* rather than code units by default.
*/
XRegExp.addToken(
/\\u{([0-9A-Fa-f]{1,6})}/,
(function () {
function pad4(s) {while (s.length < 4) s = "0" + s; return s;}
function dec(hex) {return parseInt(hex, 16);}
function hex(dec) {return parseInt(dec, 10).toString(16);}
return function (match) {
var code = dec(match[1]), offset;
if (code > 0x10FFFF) {
throw new SyntaxError("invalid Unicode code point " + match[0]);
}
if (code <= 0xFFFF) {
// Converting to \uNNNN avoids needing to escape the character and keep it separate
// from preceding tokens
return "\\u" + pad4(hex(code));
}
offset = code - 0x10000;
return "\\u" + pad4(hex(0xD800 + (offset >> 10))) + "\\u" + pad4(hex(0xDC00 + (offset & 0x3FF)));
};
}()),
{scope: "all"}
);
@slevithan
Copy link
Author

This extension was included in the XRegExp Unicode Base v1.0.0 prerelease until v1.0.0-rc-2. It is not currently in any official XRegExp addon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment