Created
August 5, 2010 10:19
-
-
Save chryss/509520 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# encoding: utf-8 | |
""" | |
localetest.py - tests collation / sort order for various Latin-script locales | |
Correct output: | |
The order for German is: | |
LATIN SMALL LETTER A | |
LATIN SMALL LETTER A WITH DIAERESIS | |
LATIN SMALL LETTER Z | |
The order for British English is: | |
LATIN SMALL LETTER A | |
LATIN SMALL LETTER E | |
LATIN SMALL LETTER Z | |
The order for Polish is: | |
LATIN SMALL LETTER A | |
LATIN SMALL LETTER A WITH OGONEK | |
LATIN SMALL LETTER Z | |
Tests show Polish and German characters with diacritics are sorted incorrectly | |
after "z" on OS X and maybe other systems. | |
See also http://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help | |
""" | |
import locale | |
import unicodedata | |
testdata = { | |
'en': {'chars': [u'a', u'z', u'e'], 'localestring': 'en_GB.UTF-8', 'lang': 'British English' }, | |
'de': {'chars': [u'a', u'z', u'ä'], 'localestring': 'de_DE.UTF-8', 'lang': 'German' }, | |
'pl': {'chars': [u'a', u'z', u'ą'], 'localestring': 'pl_PL.UTF-8', 'lang': 'Polish' } | |
} | |
for l in testdata: | |
try: | |
locale.setlocale(locale.LC_ALL, testdata[l]['localestring']) | |
except locale.Error as e: | |
print "Error for %s and locale %s: %s\n" % (l, testdata[l]['localestring'], e) | |
continue | |
print "The order for %s is:" % testdata[l]['lang'] | |
for item in sorted(testdata[l]['chars'], cmp=locale.strcoll): | |
print unicodedata.name(item) | |
print "The LC_COLLATE culture and encoding settings were %s." % ', '.join(locale.getlocale(locale.LC_COLLATE)) | |
Same for me. Works fine on Ubuntu Linux 10.4, Python 2.6.5 built with gcc 4.4.3. Also works fine, with changed locale strings, on Windows XP.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
OS X 10.6.4, gcc 4.2.1, python 2.6.5:
wrong for all, but English.