Skip to content

Instantly share code, notes, and snippets.

@cstrouse
Created April 7, 2013 07:11
Show Gist options
  • Select an option

  • Save cstrouse/5329403 to your computer and use it in GitHub Desktop.

Select an option

Save cstrouse/5329403 to your computer and use it in GitHub Desktop.
Work in progress tool for extracting highlights from Kindle ebooks.
import binascii
import re
# highlights are stored in *.mbp files corresponding to the filename of the book
# in ~/Library/Application Support/Kindle/My Kindle Content/
# EA 44 41 54 41 - beginning of highlight
# 44 41 54 41 - end of highlight (except last highlight which *sometimes* ends with 42 4B 4D 4B)
fh = '0131177052.WrkngEffLegCode.mbp'
with open(fh, 'rb') as f:
content = f.read()
data = binascii.hexlify(content)
# Regex doesn't work reliably yet
matches = re.findall('ea4441544100000[1.](.*?)44415441', data, re.S)
for match in matches:
line = binascii.unhexlify(match)
print line[1:]
print
print str(len(matches)) + ' highlights found'
@cstrouse
Copy link
Copy Markdown
Author

cstrouse commented Apr 7, 2013

After an hour or so of messing around with this and learning a lot about how the Kindle app writes annotations I came across a guy's documentation from his own reversing project.

MBP file format info: http://www.angelfire.com/ego2/idleloop/archives/mbp_file_format.txt
Perl tool for extracting annotations: http://www.angelfire.com/ego2/idleloop/mbp_reader.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment