arrowtype/README.md

Last active October 8, 2021 13:15

Star (3) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/arrowtype/713dad14fe9a574d58d1aab61ba9b2f0.js"></script>
Save arrowtype/713dad14fe9a574d58d1aab61ba9b2f0 to your computer and use it in GitHub Desktop.

Download ZIP

The basics of working with unicode values in Python

Raw

README.md

Unicode values in Python

Unicodes can either be integers (“A” is 65, “B” is 66, etc) or hex (“A” is 0x41, “B” is 0x42, etc).

When scripting with RoboFont or FontTools, a hard thing at first is that different styles come up in different contexts. For example, integers will often be used in scripts, but hex values are shown in UIs and in the TTX output of cmap (the table that maps unicode values to glyphs). So, it's helpful to know how to go between them to do different types of work.

To go from a string to an unicode integer, you can use ord(), like:

>>> ord("A")
65

To go from an integer to a hex, you can use hex(), like:

>>> hex(65)
'0x41'

To go from an integer or hex to a string, you can use chr(), like:

>>> chr(0x41)
'A'

>>> chr(65)
'A'

To go from a hex value to an integer, use int(), like:

>>> int(0x0083)
131

>>> int(0x41)
65

lianghai commented Apr 2, 2020 •

edited

Loading

I recommend unicodedata2 (https://github.com/mikekap/unicodedata2) instead of the standard library module unicodedata, as the latter one is often not the latest.

Also, fontTools.unicodedata (https://github.com/fonttools/fonttools/blob/master/Lib/fontTools/unicodedata/__init__.py) is my favorite kind of wrapped unicodedata. It prefers unicodedata2 underlyingly and provides some useful, additional tools, such as .script(char: str) -> str for the Unicode character property Script (https://www.unicode.org/reports/tr24/), and the conversion between Unicode Script codes and OTL script tags: .ot_tags_from_script(script_code: str) -> List[str] ↔ .ot_tag_to_script(tag: str) -> str.

Author

arrowtype commented Apr 2, 2020

Thanks for such helpful additions, everyone! These are great pieces of related advice.

Author

arrowtype commented Oct 8, 2021

Note to self: If you’re converting a hex string like '0x000D' to an int...

You can use int() on a string with the prefix 0x, but you need to tell it to use 0 as the base:

>>> int('0x51', 0)
81

source

arrowtype/README.md

Unicode values in Python

lianghai commented Apr 2, 2020 • edited Loading

arrowtype commented Apr 2, 2020

arrowtype commented Oct 8, 2021

lianghai commented Apr 2, 2020 •

edited

Loading