Created
December 5, 2012 06:01
-
-
Save tune/4212800 to your computer and use it in GitHub Desktop.
Rubyを使ってタイ語の表示文字単位で文字列を区切る ref: http://qiita.com/items/55c4347df63472346ac8
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
# -*- coding: utf-8 -*- | |
text = "พี่ชาย" # ["e1e", "e35", "e48", "e0a", "e32", "e22"] | |
text.scan(/.(?:[\u0E31]|[\u0E33-\u0E3A]|[\u0E47-\u0E4E])*/).each do |ch| | |
clist = ch.each_codepoint.map {|cp| "U+" + sprintf("%04x", cp)} | |
puts ch + " [#{clist.join(' ')}]" | |
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ ruby disp_ucs_thai.rb | |
พี่ [U+e1e U+e35 U+e48] | |
ช [U+e0a] | |
า [U+e32] | |
ย [U+e22] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment