Because you can't get the docs.
Create a transliterator:
greek2latin = icu.Transliterator.createInstance('Greek-Latin')
Transliterate:
greek2latin.transliterate('Ψάπφω') # => 'Psápphō'
Inverse transformation:
latin2greek = icu.Transliterator.createInstance('Greek-Latin', icu.UTransDirection.REVERSE)
latin2greek.transliterate('Psápphō') # => 'Ψάπφω'
or
latin2greek = greek2latin.createInverse()
latin2greek.transliterate('Psápphō') # => 'Ψάπφω'
See http://demo.icu-project.org/icu-bin/translit and http://userguide.icu-project.org/transforms/general for an idea of what kind of transliteration is built in.
Create a locale object:
britain = icu.Locale('en-GB')
french_ca = icu.Locale('fr_CA')
# etc.
… there's also a few shortcuts:
icu.Locale.getFrance()
icu.Locale.getDefault()
# etc.
You can get a few bits of information like name from each locale object:
britain.getDisplayName() # => 'English (United Kingdom)'
french_ca.getDisplayLanguage() # => 'French'
# etc.
See the bit above on Locales first, you'll need to understand locales in order to work the collator.
Create a collator for a particular Locale:
collator = icu.Collator.createInstance(icu.Locale('en_GB'))
Sort a list of strings, e.g.:
sorted(['sandwiches', 'angel delight', 'custard', 'éclairs', 'glühwein'], key=collator.getSortKey) #=> ['angel delight', 'custard', 'éclairs', 'glühwein', 'sandwiches']
The following makes (or should make — tailoring is a bit of a black art) thorn (Þþ) sort in Old English order (see Michael Everson's article, Sorting the letter ÞORN):
collator = icu.RuleBasedCollator('[normalization on]\n&t<þ<u\n&T<Þ<U\n&Þ=þ')
sorted(['þinking', 'tweet', 'uppity', 'Typography', 'Þeology', 'Urology'], key=collator.getSortKey) # => ['tweet', 'Typography', 'Þeology', 'þinking', 'uppity', 'Urology']
Ignore word breaks in Welsh:
rules = icu.Collator.createInstance(icu.Locale('cy')).getRules()
rules = '[alternate shifted]' + rules
collator = icu.RuleBasedCollator(rules)
Date-time:
formatter = icu.DateFormat.createDateTimeInstance(icu.DateFormat.LONG, icu.DateFormat.kDefault, icu.Locale('de_DE'))
formatter.format(datetime.now()) #=> '26. Juli 2014 14:57:22'
Date only/time only, replace the first line with e.g.:
formatter = icu.DateFormat.createDateInstance(icu.DateFormat.LONG, icu.Locale('de_DE'))
formatter = icu.DateFormat.createTimeInstance(icu.DateFormat.LONG, icu.Locale('de_DE'))
Unfortunately this is even more of a pain than you’d hope.
de_words = icu.BreakIterator.createWordInstance(icu.Locale('de_DE'))
de_words.setText('Bist du in der U-Bahn geboren?')
de_words.nextBoundary() #=> 4
de_words.nextBoundary() #=> 5
# etc.
The following function might be useful:
def iterate_breaks(text, break_iterator):
break_iterator.setText(text)
lastpos = 0
while True:
next_boundary = break_iterator.nextBoundary()
if next_boundary == -1: return
yield text[lastpos:next_boundary]
lastpos = next_boundary
Usage:
de_words = icu.BreakIterator.createWordInstance(icu.Locale('de_DE'))
list(iterate_breaks('Bist du in der U-Bahn geboren?', de_words))
#=> ['Bist', ' ', 'du', ' ', 'in', ' ', 'der', ' ', 'U', '-', 'Bahn', ' ', 'geboren', '?']
In 2024, with icu 74.2 and PyICU 2.12 I get: