Skip to content

Instantly share code, notes, and snippets.

@edsu
Last active August 29, 2015 13:57
Show Gist options
  • Save edsu/9872937 to your computer and use it in GitHub Desktop.
Save edsu/9872937 to your computer and use it in GitHub Desktop.
Print out a report of what Wikipedia articles you've visited using your Google Chrome history database. You'll need to shutdown Google Chrome first since it locks the database.
#!/usr/bin/env python
"""
Print out a report of Wikipedia articles you've visited using your Google Chrome history database.
The output is in Markdown, and you'll need to shutdown Chrome before you run this or else the
database will be locked.
"""
import os
import re
import sqlite3
home = os.path.expanduser("~")
dbfile = home + "/Library/Application Support/Google/Chrome/Default/History"
db = sqlite3.connect(dbfile)
q = """
SELECT url, title, visit_count, datetime(last_visit_time/1000000-11644473600,'unixepoch','localtime')
FROM urls
ORDER BY visit_count DESC
"""
for url, title, visit_count, last_visit in db.execute(q):
if re.match('http://(.+)\.wikipedia\.org/wiki', url):
print ("* [%s](%s) - %s - %s" % (title, url, last_visit, visit_count)).encode('utf-8')
db.close()
@hugovk
Copy link

hugovk commented Mar 30, 2014

Nice tool. Just a note.

I use HTTPS Everywhere which means my history has both an HTTP and then an HTTPS for clicks from Google searches, but just HTTPS for internal Wikipedia clicks.

So for me it makes sense to just have HTTPS (re.match('https...), not both (http[s]) or just HTTP (http).

@edsu
Copy link
Author

edsu commented Apr 1, 2014

+1 thanks @hugovk ; for the purposes of this script it might make sense to normalize them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment