Skip to content

Instantly share code, notes, and snippets.

@isc-rsingh
Last active July 6, 2017 20:11
Show Gist options
  • Save isc-rsingh/f9c0604d7859c4dd56ccb706c4d5957d to your computer and use it in GitHub Desktop.
Save isc-rsingh/f9c0604d7859c4dd56ccb706c4d5957d to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"cells": [
{
"source": "# ClickStream file aggregation\n\nOur Challenge: Referencing hundreds of individual files within our IBM Data Science Experience Jupyter Notebook seemed wrong. We want to pull in the entirety of the data by referencing a single file, and without having to re-upload the entire series of files again.\n\nWe assumed we could write a program to monitor our Object Storage conntainer and when a new file came in, append it to a \"master\" file. This turned out to be impossible however. You can't append data to an objectstorage file. You can only create or delete files. The good news is you can create a special \"manifest\" file that looks like the results of appending all files. The process for doing this is best described in this article: http://blog.ibmjstart.net/2016/04/14/e-pluribus-unum-creating-openstack-manifest-objects-in-ibm-bluemix-object-storage/\n\nThis notebook programatically creates the \"magic\" aggregated file for each group of CSV files on objectstore having a specific file name prefix.",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 1,
"source": "import requests\nfrom requests import Request, Session",
"cell_type": "code",
"metadata": {
"collapsed": true
},
"outputs": []
},
{
"execution_count": 2,
"source": "\nobjectstore_advobeta_creds = {\n \"auth\": {\n \"identity\": {\n \"methods\": [\n \"password\"\n ],\n \"password\": {\n \"user\": {\n \"id\": \"xxxx\",\n \"password\": \"xxxx\"\n }\n }\n },\n \"scope\": {\n \"project\": {\n \"id\": \"xxxx\"\n }\n }\n }\n}\n\nobjectstore_advobetaalt_creds = {\n \"auth\": {\n \"identity\": {\n \"methods\": [\n \"password\"\n ],\n \"password\": {\n \"user\": {\n \"id\": \"xxxx\",\n \"password\": \"xxxx\"\n }\n }\n },\n \"scope\": {\n \"project\": {\n \"id\": \"xxxx\"\n }\n }\n }\n}\n",
"cell_type": "code",
"metadata": {
"collapsed": true
},
"outputs": []
},
{
"execution_count": 3,
"source": "auth_url = 'https://identity.open.softlayer.com/v3/auth/tokens'\nr = requests.post(auth_url, json=objectstore_advobeta_creds)",
"cell_type": "code",
"metadata": {
"collapsed": true
},
"outputs": []
},
{
"execution_count": 4,
"source": "clickstreamtypes = ['addtocart','browsing','checkout','clicks','login','logoutwithpurchase','logoutwithoutpurchase']\nsess = Session()\n\nif r.status_code == 201: \n auth_token = r.headers.get('X-Subject-Token')\n rj = r.json()\n for endpoints in rj['token']['catalog']: \n if endpoints['name'] == 'swift': \n for endpoint in endpoints['endpoints']: \n if endpoint['region'] == 'dallas' and endpoint['interface'] == 'public': \n for clickstreamtype in clickstreamtypes:\n req = Request('PUT', endpoint['url']+'/AdvoBeta/all-'+clickstreamtype+'.csv')\n prepped = sess.prepare_request(req)\n prepped.headers['X-Auth-Token'] = auth_token\n prepped.headers['X-Object-Manifest'] = 'AdvoBeta/'+clickstreamtype\n prepped.headers['Content-Length'] = 0\n prepped.headers['Cache-Control'] = 'no-cache'\n prepped.headers['Host'] = 'dal.objectstorage.open.softlayer.com'\n resp = sess.send(prepped)\nelse: \n print \"Bad response: \"+r.status_code",
"cell_type": "code",
"metadata": {
"collapsed": true
},
"outputs": []
},
{
"execution_count": null,
"source": "",
"cell_type": "code",
"metadata": {
"collapsed": true
},
"outputs": []
}
],
"metadata": {
"language_info": {
"pygments_lexer": "ipython2",
"version": "2.7.11",
"mimetype": "text/x-python",
"name": "python",
"file_extension": ".py",
"codemirror_mode": {
"version": 2,
"name": "ipython"
},
"nbconvert_exporter": "python"
},
"kernelspec": {
"language": "python",
"display_name": "Python 2 with Spark 2.0",
"name": "python2-spark20"
}
},
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment