Skip to content

Instantly share code, notes, and snippets.

@tonytonyjan
Created June 10, 2022 17:16
Show Gist options
  • Save tonytonyjan/588c3e5f383b964d9b82cc411fa1993f to your computer and use it in GitHub Desktop.
Save tonytonyjan/588c3e5f383b964d9b82cc411fa1993f to your computer and use it in GitHub Desktop.
A Ruby parser which parses EastAsianWidth.txt from UAX #11 at https://www.unicode.org/reports/tr11/
# frozen_string_literal: true
# Copyright (c) 2022 Weihang Jian <https://tonytonyjan.net>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
# A parser which parses EastAsianWidth-14.0.0.txt
# For more information, see UAX #11: East Asian Width,
# at https://www.unicode.org/reports/tr11/
#
# == Example
#
# eaw = EastAsianWidth.new(File.read('EastAsianWidth.txt'))
# puts %w[test 測試! 測試!].map{ eaw.string_width(_1) }
#
# Output:
#
# 4
# 5
# 6
#
class EastAsianWidth
def initialize(east_asian_width_txt_data)
@lookup_table = []
east_asian_width_txt_data.each_line do |line|
next unless line.start_with?(/[0-9A-F]/)
code_point, property = line[0, 16].split(';')
property.rstrip!
if code_point.include?('..')
first, last = code_point.split('..')
@lookup_table << [first.to_i(16)..last.to_i(16), property]
else
@lookup_table << [code_point.to_i(16), property]
end
end
end
def string_width(string)
string.codepoints.sum { lookup_width(_1) }
end
def lookup_width(code_point)
property = lookup(code_point)
case property
when 'Na', 'H' then 1
when 'W', 'F' then 2
else
warn "code point 0x#{code_point.to_s(16)} has property #{property}"
1
end
end
def lookup(code_point)
@lookup_table.find do |integer_of_range, property|
return property if integer_of_range.is_a?(Range) && integer_of_range.cover?(code_point)
return property if integer_of_range.is_a?(Integer) && integer_of_range == code_point
end
raise "missing code point 0x#{code_point.to_s(16)}"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment