Created
April 7, 2013 07:11
-
-
Save cstrouse/5329403 to your computer and use it in GitHub Desktop.
Work in progress tool for extracting highlights from Kindle ebooks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import binascii | |
| import re | |
| # highlights are stored in *.mbp files corresponding to the filename of the book | |
| # in ~/Library/Application Support/Kindle/My Kindle Content/ | |
| # EA 44 41 54 41 - beginning of highlight | |
| # 44 41 54 41 - end of highlight (except last highlight which *sometimes* ends with 42 4B 4D 4B) | |
| fh = '0131177052.WrkngEffLegCode.mbp' | |
| with open(fh, 'rb') as f: | |
| content = f.read() | |
| data = binascii.hexlify(content) | |
| # Regex doesn't work reliably yet | |
| matches = re.findall('ea4441544100000[1.](.*?)44415441', data, re.S) | |
| for match in matches: | |
| line = binascii.unhexlify(match) | |
| print line[1:] | |
| print str(len(matches)) + ' highlights found' |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
After an hour or so of messing around with this and learning a lot about how the Kindle app writes annotations I came across a guy's documentation from his own reversing project.
MBP file format info: http://www.angelfire.com/ego2/idleloop/archives/mbp_file_format.txt
Perl tool for extracting annotations: http://www.angelfire.com/ego2/idleloop/mbp_reader.html