You need the ttx
binary from FontTools to dump the cmap
table of the fonts into an xml font.ttx
file:
> sudo apt-get install fonttools
> ttx -t cmap sorren.eot
Dumping "sorren.eot" to "sorren.ttx"...
Dumping 'cmap' table...
> ln -s sorren.ttx sorren.xml
Open sorren.xml
in Chrome. Open devtools, paste the ttx_to_regexp.js
code into the Console tab and hit return.
Next, type fontRange()
to get the regexp covering all codepoints known to this font (platformID
0 is Unicode). If you're going to paste it elsewhere, you might as well type copy(fontRange())
and avoid any copy-paste errors:
> fontRange()
[\x00\x0d -~\xa0\xad\u2000-\u200a\u2010-\u2014\u202f\u205f\ue000]
Repeat above for all fonts you are interested in.
To become an international superhero, fork this gist, make a shell-runable node.js application font-to-regexp.js
that just takes your font file(s?) on the command line, invokes ttx
for you on it(them), loads the result with jsdom, runs fontRange
on it and prints the regexp to stdout, instead of doing the above steps manually. Oh, and brag about it in the comments here, of course, so other people find it too!
No need to parse ttx's output, just use it's API (it's in Python). Here's a script that takes a font and a codepoint number and writes the name of the character and whether the font contains it.