Skip to content

Instantly share code, notes, and snippets.

Created April 12, 2012 23:39
Show Gist options
  • Save chadselph/2371903 to your computer and use it in GitHub Desktop.
Save chadselph/2371903 to your computer and use it in GitHub Desktop.
gsm encoding for python

(From, ran out of space in the comment section.)

Running the original file does not work in either Python 2 or 3.

In Python2, the program prints this:


(which is wrong) because it is using the indexes of gsm which do not map to the index of their GSM encodings due to the fact that it is a bytestring with some characters taking up multiple bytes. gsm is actually equal to

\x98\xce\x9e\x1b\xc3\x86\xc3\xa6\xc3\x9f\xc3\x89 !"#\xc2\xa4%&\'()*+,-./0123456789:;<=>?

Notice that non-ascii characters take at least 2 bytes to encode in UTF8 and as a result, gsm is longer than 128 bytes long. Doing gsm.find(c) will return the index of the byte, which is no longer synchronized with the gsm codepoints. For example:

>>> gsm.find('$')  # we might expect this to return 2, the GSM codepoint for '$'
# -*- coding: utf8 -*-
# (original file for reference)
gsm = ("@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?"
ext = ("````````````````````^```````````````````{}`````\\````````````[~]`"
def gsm_encode(plaintext):
res = ""
for c in plaintext:
idx = gsm.find(c);
if idx != -1:
res += chr(idx)
idx = ext.find(c)
if idx != -1:
res += chr(27)
res += chr(idx)
return res.encode('hex')
print(gsm_encode("Hello World"))
# -*- coding: utf8 -*-
The approach will mostly work in Python3, where strings are real character strings and not byte-strings. However, the
gsm_encode function will need to be fixed to work in Python3, like so:
import binascii
gsm = ("@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?"
ext = ("````````````````````^```````````````````{}`````\\````````````[~]`"
def gsm_encode(plaintext):
res = bytearray()
for c in plaintext:
idx = gsm.find(c);
if idx != -1:
idx = ext.find(c)
if idx != -1:
return binascii.hexlify(res)
print(gsm_encode("Hello World"))
# -*- coding: utf8 -*-
For Python2, you can make it work if gsm is a "unicode" instead of a
bytestring. Make this happen by decoding the bytes from utf8.
gsm = ("@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?"
ext = ("````````````````````^```````````````````{}`````\\````````````[~]`"
def gsm_encode(plaintext):
res = ""
for c in plaintext:
idx = gsm.find(c);
if idx != -1:
res += chr(idx)
idx = ext.find(c)
if idx != -1:
res += chr(27)
res += chr(idx)
return res.encode('hex')
print(gsm_encode("Hello World"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment