Created
December 21, 2012 16:38
-
-
Save jjmalina/4353896 to your computer and use it in GitHub Desktop.
Strings in Python. Credit to @lsemel
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
.dMMMb dMMMMMMP dMMMMb dMP dMMMMb .aMMMMP .dMMMb | |
dMP" VP dMP dMP.dMP amr dMP dMP dMP" dMP" VP | |
VMMMb dMP dMMMMK" dMP dMP dMP dMP MMP" VMMMb | |
dP .dMP dMP dMP"AMF dMP dMP dMP dMP.dMP dP .dMP | |
VMMMP" dMP dMP dMP dMP dMP dMP VMMMP" VMMMP" | |
.aMMMb dMMMMb dMMMMMP dMMMMb dMP dMP dMMMMMMP dMMMMMP .dMMMb | |
dMP"dMP dMP.dMP dMP dMP"dMP dMP.dMP dMP dMP dMP" VP | |
dMMMMMP dMMMMK" dMMMP dMMMMK" VMMMMP dMP dMMMP VMMMb | |
dMP dMP dMP"AMF dMP dMP.aMF dA .dMP dMP dMP dP .dMP | |
dMP dMP dMP dMP dMMMMMP dMMMMP" VMMMP" dMP dMMMMMP VMMMP" | |
dMP dMMMMb .dMMMb .aMMMb dMMMMMMMMb dMMMMMP | |
amr dMP dMP dMP" VP dMP"dMP dMP"dMP"dMPdMP | |
dMP dMP dMP VMMMb dMP dMP dMP dMP dMPdMMMP | |
dMP dMP dMP dP .dMP dMP.aMP dMP dMP dMPdMP | |
dMP dMP dMP VMMMP" VMMMP" dMP dMP dMPdMMMMMP | |
dMMMMMP dMMMMb .aMMMb .aMMMb dMMMMb dMP dMMMMb .aMMMMP | |
dMP dMP dMP dMP"VMP dMP"dMP dMP VMP amr dMP dMP dMP" | |
dMMMP dMP dMP dMP dMP dMP dMP dMP dMP dMP dMP dMP MMP" | |
dMP dMP dMP dMP.aMP dMP.aMP dMP.aMP dMP dMP dMP dMP.dMP amr | |
dMMMMMP dMP dMP VMMMP" VMMMP" dMMMMP" dMP dMP dMP VMMMP" dMP | |
dMP dMP dMMMMb dMP .aMMMb .aMMMb dMMMMb dMMMMMP dMP .dMMMb | |
dMP dMP dMP dMP amr dMP"VMP dMP"dMP dMP VMP dMP amr dMP" VP | |
dMP dMP dMP dMP dMP dMP dMP dMP dMP dMP dMMMP dMP VMMMb | |
dMP.aMP dMP dMP dMP dMP.aMP dMP.aMP dMP.aMP dMP dMP dP .dMP | |
VMMMP" dMP dMP dMP VMMMP" VMMMP" dMMMMP" dMMMMMP dMP VMMMP" | |
dMMMMb dMP dMP dMMMMMMMMb dMMMMb dMMMMMP dMMMMb .dMMMb | |
dMP dMP dMP dMP dMP"dMP"dMPdMP"dMP dMP dMP.dMP dMP" VP | |
dMP dMP dMP dMP dMP dMP dMPdMMMMK" dMMMP dMMMMK" VMMMb | |
dMP dMP dMP.aMP dMP dMP dMPdMP.aMF dMP dMP"AMF dP .dMP amr | |
dMP dMP VMMMP" dMP dMP dMPdMMMMP" dMMMMMP dMP dMP VMMMP" dMP | |
* You do not know the encoding of a string. It could be ASCII or UTF-8. | |
There is no way to tell. | |
* Python will assume ASCII encoding when converting from strings to unicode | |
and throw an error if it encounters an illegal character, such as if you | |
just encoded UTF-8 into a string (by using smart_str, for instance). | |
* You should generally not have to do any encoding into strings. Django takes | |
care of providing Unicode objects, and encoding appropriately whenever | |
it outputs anything (to the response, or to the database) | |
* Think of strings as a byte array, and Unicode as some sort of internal object (say, a linked list) that | |
you can't input or output without encoding or decoding | |
>>> from django.utils.encoding import smart_str | |
>>> 'a'+'b' # Two bytes, assumed to be ASCII | |
'ab' | |
>>> u'a'+u'b' # Two unicode characters | |
u'ab' | |
>>> 'a'+u'b' # The byte 'a' is converted to Unicode, under the assumption it represents ASCII | |
u'ab' | |
>>> smart_str('a') # Does nothing | |
'a' | |
>>> smart_str(u'a') # The Unicode character 'a' encoded as bytes | |
'a' | |
>>> smart_str(u'\u00ff') # Another unicode character, encoded as bytes | |
'\xc3\xbf' | |
>>> smart_str(u'\u00ff') + 'aa' # Concatenating two sets of two bytes each | |
'\xc3\xbfaa' | |
>>> smart_str(u'\u00ff') + u'aa' # Tries to convert those bytes to Unicode, assuming they are ASCII | |
Traceback (most recent call last): | |
File "<stdin>", line 1, in <module> | |
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) | |
>>> smart_str(u'\u00ff').decode('utf-8') + u'aa' # Tell Python those bytes are not ASCII, but are UTF-8 | |
u'\xffaa' | |
""" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment