Last active
July 6, 2017 20:11
-
-
Save isc-rsingh/f9c0604d7859c4dd56ccb706c4d5957d to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"cells": [ | |
{ | |
"source": "# ClickStream file aggregation\n\nOur Challenge: Referencing hundreds of individual files within our IBM Data Science Experience Jupyter Notebook seemed wrong. We want to pull in the entirety of the data by referencing a single file, and without having to re-upload the entire series of files again.\n\nWe assumed we could write a program to monitor our Object Storage conntainer and when a new file came in, append it to a \"master\" file. This turned out to be impossible however. You can't append data to an objectstorage file. You can only create or delete files. The good news is you can create a special \"manifest\" file that looks like the results of appending all files. The process for doing this is best described in this article: http://blog.ibmjstart.net/2016/04/14/e-pluribus-unum-creating-openstack-manifest-objects-in-ibm-bluemix-object-storage/\n\nThis notebook programatically creates the \"magic\" aggregated file for each group of CSV files on objectstore having a specific file name prefix.", | |
"cell_type": "markdown", | |
"metadata": {} | |
}, | |
{ | |
"execution_count": 1, | |
"source": "import requests\nfrom requests import Request, Session", | |
"cell_type": "code", | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [] | |
}, | |
{ | |
"execution_count": 2, | |
"source": "\nobjectstore_advobeta_creds = {\n \"auth\": {\n \"identity\": {\n \"methods\": [\n \"password\"\n ],\n \"password\": {\n \"user\": {\n \"id\": \"xxxx\",\n \"password\": \"xxxx\"\n }\n }\n },\n \"scope\": {\n \"project\": {\n \"id\": \"xxxx\"\n }\n }\n }\n}\n\nobjectstore_advobetaalt_creds = {\n \"auth\": {\n \"identity\": {\n \"methods\": [\n \"password\"\n ],\n \"password\": {\n \"user\": {\n \"id\": \"xxxx\",\n \"password\": \"xxxx\"\n }\n }\n },\n \"scope\": {\n \"project\": {\n \"id\": \"xxxx\"\n }\n }\n }\n}\n", | |
"cell_type": "code", | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [] | |
}, | |
{ | |
"execution_count": 3, | |
"source": "auth_url = 'https://identity.open.softlayer.com/v3/auth/tokens'\nr = requests.post(auth_url, json=objectstore_advobeta_creds)", | |
"cell_type": "code", | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [] | |
}, | |
{ | |
"execution_count": 4, | |
"source": "clickstreamtypes = ['addtocart','browsing','checkout','clicks','login','logoutwithpurchase','logoutwithoutpurchase']\nsess = Session()\n\nif r.status_code == 201: \n auth_token = r.headers.get('X-Subject-Token')\n rj = r.json()\n for endpoints in rj['token']['catalog']: \n if endpoints['name'] == 'swift': \n for endpoint in endpoints['endpoints']: \n if endpoint['region'] == 'dallas' and endpoint['interface'] == 'public': \n for clickstreamtype in clickstreamtypes:\n req = Request('PUT', endpoint['url']+'/AdvoBeta/all-'+clickstreamtype+'.csv')\n prepped = sess.prepare_request(req)\n prepped.headers['X-Auth-Token'] = auth_token\n prepped.headers['X-Object-Manifest'] = 'AdvoBeta/'+clickstreamtype\n prepped.headers['Content-Length'] = 0\n prepped.headers['Cache-Control'] = 'no-cache'\n prepped.headers['Host'] = 'dal.objectstorage.open.softlayer.com'\n resp = sess.send(prepped)\nelse: \n print \"Bad response: \"+r.status_code", | |
"cell_type": "code", | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [] | |
}, | |
{ | |
"execution_count": null, | |
"source": "", | |
"cell_type": "code", | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [] | |
} | |
], | |
"metadata": { | |
"language_info": { | |
"pygments_lexer": "ipython2", | |
"version": "2.7.11", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"file_extension": ".py", | |
"codemirror_mode": { | |
"version": 2, | |
"name": "ipython" | |
}, | |
"nbconvert_exporter": "python" | |
}, | |
"kernelspec": { | |
"language": "python", | |
"display_name": "Python 2 with Spark 2.0", | |
"name": "python2-spark20" | |
} | |
}, | |
"nbformat_minor": 1 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment