Last active
February 17, 2018 04:07
-
-
Save cdosborn/79f50340ab609656dd4481c02cadfcc0 to your computer and use it in GitHub Desktop.
Puzzling behavior of python's repr and __repr__
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If you open up a python shell and execute the following code, | |
class Foo: | |
def __repr__(self): | |
return u'\xe0' | |
repr(Foo()) | |
The last line will throw the following exception: | |
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in | |
position 0: ordinal not in range(128) | |
Foo is a contrived example. Below is an example derived from apache-libcloud: | |
def __repr__(self): | |
return '<Node: uuid=%s, name=%s ...>' % (self.uuid, self.name) | |
Seems harmless. However, this too will cause the exception to be thrown if | |
self.name is a unicode string. | |
I was puzzled when I reproduced the issue with repr(Foo()). I assumed that | |
repr just returned the result of calling __repr__() on the Foo instance. | |
For example the following is okay | |
> repr(u'\xe0') | |
"u'\\xe0'" | |
But... | |
> repr(Foo()) | |
UnicodeEncodeError... | |
Then I tried googling around about the /actual/ behavior of repr. Someone on | |
the internet said that repr requires __repr__ to return an ascii string in | |
python 2. Sure enough the python docs state for __repr__: | |
The return value must be a string object. | |
This just made me all the more curious. Next goal: actually go look at the | |
python implementation of repr. | |
Much googling followed. How do I know which python implementation I'm using? | |
Where are the builtins stored in cpython? After some searching I discovered | |
that repr got the output of __repr__ on an object and then tried to convert | |
that string to ascii! | |
TLDR: | |
In python 2 the repr builtin expects a __repr__ method which returns an ascii | |
string (or something coercible to an ascii string). It's required because | |
repr will translate the result of X.__repr__() to ascii, either succeeding or | |
throwing the above UnicodeEncodeError. | |
Reasons I think this quirk matters: | |
1) __repr__ is the fallback way that objects are translated into strings. | |
str(Foo()) will use __repr__ if __str__ is not defined and fail in the same | |
way as repr(Foo()). | |
2) It's really easy to create __repr__ methods that will fail unexepectedly | |
Like the example above (apache-libcloud), including a user provided field like | |
self.name is likely to include unicode characters, in turn causing the ascii | |
encoding to fail. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment