This guide will get you up and running with a Boomerang dataset, and subsequently run through a couple of small queries using RethinkDB.
Create a working directory:
mkdir boomerang-rethink
cd boomerang-rethink
Install Sesame:
sudo pip install --pre sesame
Request the Sesame key from Mafro. You will be supplied with a link to a Google doc, you can then copy the content of that file into a text file on your computer, or select File>Download As>Text.
Assuming the file you acquire from Google docs is called sesame-rethinkdb.key.txt
, run these instructions to grab & decrypt the DB dump:
curl -O https://s3-ap-southeast-2.amazonaws.com/boomerang-rethinkdb-backups/rethinkdb_goodlifehealthclubs_com_au_dump_2014-05-14T02_03_02.encrypted
sesame decrypt -k sesame-rethinkdb.key.txt rethinkdb_goodlifehealthclubs_com_au_2014-05-14T02:03:02.encrypted
Now import the GLHC Boomerang dataset into RethinkDB:
rethinkdb restore rethinkdb_goodlifehealthclubs_com_au_dump_2014-05-14T02:03:02.tar.gz
NB: this part will change in the near future
Create a new script initdb.py
, with the following content:
#! /usr/bin/env python
client_ref = "goodlifehealthclubs_com_au"
import rethinkdb as r
import datetime
try:
# create a backup DB
conn = r.connect('localhost', 28015)
r.db_create(client_ref).run(conn)
r.db(client_ref).table_create('backups').run(conn)
r.db(client_ref).table_create('products').run(conn)
conn.close()
print "Created DB {}".format(client_ref)
except:
pass
# connect to the new DB
conn = r.connect('localhost', 28015, db=client_ref)
# index products: used to easily lookup carts by the product keys they contain
if 'products' not in r.table('backups').index_list().run(conn):
start = datetime.datetime.now()
r.table('backups').index_create('products', multi=True).run(conn)
r.table('backups').index_wait('products').run(conn)
print "index 'products' created in {}".format(datetime.datetime.now()-start)
# index email address: like the remarkets bucket
if 'email_address' not in r.table('backups').index_list().run(conn):
start = datetime.datetime.now()
r.table('backups').index_create('email_address', lambda x: x['user']['EmailAddress']).run(conn)
r.table('backups').index_wait('email_address').run(conn)
print "index 'email_address' created in {}".format(datetime.datetime.now()-start)
# index user agent: helpful for debugging bots etc
if 'user_agent' not in r.table('backups').index_list().run(conn):
start = datetime.datetime.now()
r.table('backups').index_create('user_agent', lambda x: x['cart']['user_agent'], multi=True).run(conn)
r.table('backups').index_wait('user_agent').run(conn)
print "index 'user_agent' created in {}".format(datetime.datetime.now()-start)
print "All indexes present"
conn.close()
Mark the script as executable and run it:
chmod u+x initdb.py
./initdb.py
Create a file containing the following and the run it with python query.py
. Work through each step in the script and understand what's happening:
#! /usr/bin/env python
import json
import rethinkdb as r
r.connect('localhost', 28015, db='goodlifehealthclubs_com_au').repl()
# print out a easy-to-read representation of a single object
def print_obj(obj):
if 'last_updated' in obj and isinstance(obj['last_updated'], basestring) is False:
obj['last_updated'] = obj['last_updated'].isoformat()
print json.dumps(obj, indent=2)
print 'Total number of records'
print r.table('backups').count().run()
print '--'
# load first object which was remarketed, and print it
obj = list(r.table('backups').filter(r.row['cart']['remarketing']).limit(1).run())[0]
print_obj(obj)
print '--'
# how many carts have been remarketed
print 'Total carts remarketed'
print r.table('backups').filter(r.row['cart']['remarketing']).count().run()
print '--'
# how many remarketed carts had their survey responded to
print 'Total survey responses'
surveys = list(
r.table('backups').filter(r.row['cart']['remarketing'][0]['survey_response']).run()
)
print len(surveys)
print '--'
for s in surveys:
print s['cart']['last_updated']
- General DB and infrastructure questions about RethinkDB: http://rethinkdb.com/faq/
- Working examples of Python code to query RethinkDB: http://rethinkdb.com/docs/cookbook/python/
- Translate your SQL skillz into ReQL: http://rethinkdb.com/docs/sql-to-reql/