Skip to content

Instantly share code, notes, and snippets.

@vmi
Created January 18, 2012 14:29
Show Gist options
  • Save vmi/1633236 to your computer and use it in GitHub Desktop.
Save vmi/1633236 to your computer and use it in GitHub Desktop.
Calulate byte length of string as UTF-8.
/*
* Fair License (Fair)
*
* Copyright (c) 2012, IWAMURO Motonori
*
* Usage of the works is permitted provided that this instrument is
* retained with the works, so that any entity that uses the works is
* notified of this instrument.
*
* DISCLAIMER: THE WORKS ARE WITHOUT WARRANTY.
*/
// Calulate byte length of string as UTF-8.
function utf8length(s) {
var len = s.length;
var u8len = 0;
for (var i = 0; i < len; i++) {
var c = s.charCodeAt(i);
if (c <= 0x007f) { // ASCII
u8len++;
} else if (c <= 0x07ff) {
u8len += 2;
} else if (c <= 0xd7ff || 0xe000 <= c) {
u8len += 3;
} else if (c <= 0xdbff) { // high-surrogate code
c = s.charCodeAt(++i);
if (c < 0xdc00 || 0xdfff < c) // Is trailing char low-surrogate code?
throw "Error: Invalid UTF-16 sequence. Missing low-surrogate code.";
u8len += 4;
} else /* if (c <= 0xdfff) */ { // low-surrogate code
throw "Error: Invalid UTF-16 sequence. Missing high-surrogate code.";
}
}
return u8len;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment