Created
April 19, 2010 00:08
-
-
Save horatio-sans-serif/370636 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
~ ⚡ node /tmp/readself.js | |
length=214 | |
// start of file readself.js | |
sys=require("sys"); | |
fs=require("fs"); | |
contents=fs.readFileSync(__filename, encoding="utf8"); // ö日本語 | |
sys.puts("length="+contents.length); | |
sys.puts(contents); | |
// end of file readself.js | |
~ ⚡ wc -c /tmp/readself.js | |
221 /tmp/readself.js |
sure "length" is # characters not bytes but help me figure out what's up
"ö日本語" is four characters, and in UTF-8 "ö" is two bytes and the Han characters are each 3 bytes, so you have 7 bytes more than characters.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Oh, and:
Note how Node.js reads the file as UTF-8 and reports the length to be 214 bytes.
The file on disk is 221 bytes, a difference of 7.
The following shows that "// ö日本語" as a String has a length of 7.
Remove the comment
// ö日本語"
and it works as expected.What am I missing?