Created
September 28, 2016 20:14
-
-
Save jdunck/fb1aa58808f011aa25c3f4f1d73ff201 to your computer and use it in GitHub Desktop.
Handmade CJK coverage util
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CJK Radical Supplement ,2E80-2EFF | |
Kangxi Radicals ,2F00-2FDF | |
CJK symbols and punctuation ,3000-303F | |
Hiragana ,3040-309F | |
Katakana ,30A0-30FF | |
CJK strokes ,31C0-31EF | |
Katakana Common ,31F0-31FF | |
,3200-33FF | |
CJK compatibility ,3300-33FF | |
CJK Unified Ideographs ,4E00-9FFF | |
CJK compatibility ,F900-FAFF | |
,FE30-FE4F | |
Katakana halfwidth ,FF00-FFEF | |
Kana Supplement ,1B000-1B0FF |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ ls -1 font-repo/ | |
HelveticaNeue.ttf | |
LucidaGrande.ttc | |
meiryo.ttc | |
ヒラギノ丸ゴ ProN W4.ttc | |
ヒラギノ角ゴシック W8.ttc | |
$ python handmade-cjk.py | |
num desired: 23216 | |
num unsupported: 6640 | |
extras (supported by not desired): 6563 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
from __future__ import unicode_literals | |
import os | |
from itertools import chain | |
from fontTools.ttLib import TTFont | |
def hex_range_to_dec(hex_range): | |
lower, upper = map(lambda o: int(o, 16), hex_range.split('-')) | |
# hex ranges are inclusive, and python's range end is exclusive, so +1 | |
return range(lower, upper+1) | |
def desired_chars(range_lines): | |
needed = set() | |
for line in range_lines: | |
needed |= set(hex_range_to_dec(line)) | |
return needed | |
def supported_by(font_list): | |
provided = set() | |
for font_name in font_list: | |
ttf = TTFont(font_name, fontNumber=0) | |
provided |= set(chain.from_iterable([charCode for charCode in table.cmap.keys()] for table in ttf["cmap"].tables)) | |
return provided | |
if __name__ == '__main__': | |
with open('handmade-cjk.txt', 'r') as f: | |
desired_ranges = [line[:-1].rsplit(',', 1)[1] for line in f.readlines()] | |
desired = desired_chars(desired_ranges) | |
repo_path = './font-repo/' | |
files = [os.path.join(repo_path, fn) for fn in os.listdir(repo_path)] | |
supported = supported_by(files) | |
unsupported = desired - supported | |
extra = supported - desired | |
print "num desired: %s" % len(desired) | |
print "num unsupported: %s" % len(unsupported) | |
print "extras (supported by not desired): %s" % len(extra) |
It looks like "num desired" is measured in bytes, and "num supported" is measured in characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Strictly in "works for me" territory, but:
Requires fontTools, assumes your font repo dir is ./font-repo, assumes hex ranges are inclusive and in a specific file format.