Skip to content

Instantly share code, notes, and snippets.

@thinkjson
Created January 28, 2013 21:52
Show Gist options
  • Select an option

  • Save thinkjson/4659461 to your computer and use it in GitHub Desktop.

Select an option

Save thinkjson/4659461 to your computer and use it in GitHub Desktop.
Unicode in Python
Last login: Mon Jan 28 12:42:42 on ttys002
Mark-Cahill:~ mark.cahill$ python
Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = u'B\u2056blia'
>>> a
u'B\u2056blia'
>>> print a
B⁖blia
>>> a = u'B\u0301blia'
>>> print a
B́blia
>>> a = u'B\u0205blia'
>>> print a
Bȅblia
>>> a = u'B\u0237blia'
>>> print a
Bȷblia
>>> a = u'B\u00edblia'
>>> priont a
File "<stdin>", line 1
priont a
^
SyntaxError: invalid syntax
>>> print a
Bíblia
>>> a = u'B\u00edblia'
>>> print a
Bíblia
>>> import csv
>>> import sys
>>> out = csv.writer(sys.stdout)
>>> out.writerow(['ascii'])
ascii
>>> out.writerow([a])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 1: ordinal not in range(128)
>>> a.encode('utf-8')
'B\xc3\xadblia'
>>> out.writerow([a.encode('utf-8')])
Bíblia
>>> out.writerow([a])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 1: ordinal not in range(128)
>>> print a
Bíblia
>>> print a.encode('utf-8')
Bíblia
>>> ^D
Mark-Cahill:~ mark.cahill$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment