(From http://stackoverflow.com/questions/2452861/python-library-for-converting-plain-text-ascii-into-gsm-7-bit-character-set, ran out of space in the comment section.)
Running the original file does not work in either Python 2 or 3.
In Python2, the program prints this:
64868d8d903a7390938d85
(which is wrong) because it is using the indexes of gsm
which do not map to the index of their GSM encodings due to the fact that it is a bytestring with some characters taking up multiple bytes. gsm
is actually equal to
\r\xc3\x85\xc3\xa5\xce\x94_\xce\xa6\xce\x93\xce\x9b\xce\xa9\xce\xa0\xce\xa8\xce\xa3\xce
\x98\xce\x9e\x1b\xc3\x86\xc3\xa6\xc3\x9f\xc3\x89 !"#\xc2\xa4%&\'()*+,-./0123456789:;<=>?
\xc2\xa1ABCDEFGHIJKLMNOPQRSTUVWXYZ\xc3\x84\xc3\x96\xc3\x91\xc3\x9c`\xc2\xbfabcdefghijklm
nopqrstuvwxyz\xc3\xa4\xc3\xb6\xc3\xb1\xc3\xbc\xc3\xa0'
Notice that non-ascii characters take at least 2 bytes to encode in UTF8 and as a result, gsm
is longer than 128 bytes long. Doing gsm.find(c)
will return the index of the byte, which is no longer synchronized with the gsm codepoints. For example:
>>> gsm.find('$') # we might expect this to return 2, the GSM codepoint for '$'
3