Last active
December 22, 2015 05:18
-
-
Save edouard-lopez/6422608 to your computer and use it in GitHub Desktop.
Converting Unicode point to character with awk
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
awk '/U/ && /kMandarin/{split($1,uc,"+"); printf "%s;%s;%s\n",$1,system("/usr/bin/printf \"%b\n\" \"\\u" uc[2] "\""),$3 }' ./unihan/Unihan/Unihan_Readings.txt | head | |
㐀 | |
U+3400;0;qiū | |
㐁 | |
U+3401;0;tiàn | |
㐄 | |
U+3404;0;kuà | |
㐅 | |
U+3405;0;wǔ | |
㐆 | |
U+3406;0;yǐn |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
awk '/U/ && /kMandarin/ {print $0}' ./resources/unihan/Unihan/Unihan_Readings.txt | head | |
U+3400 kMandarin qiū | |
U+3401 kMandarin tiàn | |
U+3404 kMandarin kuà | |
U+3405 kMandarin wǔ | |
U+3406 kMandarin yǐn | |
U+340C kMandarin yí | |
U+3416 kMandarin xié | |
U+341C kMandarin chóu | |
U+3421 kMandarin nuò | |
U+3424 kMandarin dān |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I end up using bash as it's simplier as recommended by
geirha
on #awk: