Skip to content

Instantly share code, notes, and snippets.

@fundon
Created May 21, 2012 14:39
Show Gist options
  • Select an option

  • Save fundon/2762688 to your computer and use it in GitHub Desktop.

Select an option

Save fundon/2762688 to your computer and use it in GitHub Desktop.
Get UTF-8 Codepoint from UTF-8 Hex-Num array in JavaScript
function getCodePoint(array) {
/*-----------------------------------------------------------------------------------------
[UCS-2 (UCS-4)] [bit pattern] [1st byte] [2nd byte] [3rd byte] [4th byte]
U+ 0000.. U+007F 00000000-0xxxxxxx 0xxxxxxx
U+ 0080.. U+07FF 00000xxx-xxyyyyyy 110xxxxx 10yyyyyy
U+ 0800.. U+FFFF xxxxyyyy-yyzzzzzz 1110xxxx 10yyyyyy 10zzzzzz
U+10000..U+1FFFFF 00000000-000wwwxx 11110www 10xxxxxx 10yyyyyy 10zzzzzz
-xxxxyyyy-yyzzzzzzz
------------------------------------------------------------------------------------------*/
var bytes = array.length;
var firstShift = (bytes === 1) ? 0 : (bytes + 1);
var codePoint = ar[0] & (0xFF >> firstShift);
for(var n = 1; n < bytes; n++) {
codePoint <<= 6;
codePoint += array[n] & 0x3F; // Mask 0x00111111
}
return codePoint;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment