This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{"nbformat_minor": 0, "cells": [{"execution_count": 1, "cell_type": "code", "source": "import ujson as json\nimport matplotlib.pyplot as plt\nimport pandas as pd\nimport numpy as np\nimport plotly.plotly as py\nimport networkx as nx\nimport collections\n\n\nfrom moztelemetry import get_pings, get_pings_properties, get_one_ping_per_client\n\n%pylab inline", "outputs": [{"output_type": "stream", "name": "stdout", "text": "Populating the interactive namespace from numpy and matplotlib\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 3, "cell_type": "code", "source": "pings = get_pings(sc, app=\"Firefox\",\n channel=\"nightly\",\n submission_date=(\"20150507\",\"20150514\"),\n fraction=1,\n schema=\"v4\")", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"execution_count": 38, "cell_type": "code", "source": "def extract_sub(p):\n return p.get('payload', {}).get('info', {}).get('subsessionId', 'NO |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{"nbformat_minor": 0, "cells": [{"source": "# Session Signature matching", "cell_type": "markdown", "metadata": {}}, {"execution_count": 81, "cell_type": "code", "source": "import ujson as json\nfrom operator import add\n# %pylab inline", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 82, "cell_type": "code", "source": "outBucketName = \"net-mozaws-prod-us-west-2-pipeline-analysis\"\npathToOutput = \"/bcolloran/mergedDataPerClient/nightly/2015-06-15/10009clients/\"", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"execution_count": 83, "cell_type": "code", "source": "# for a tiny sample, you can load one part: \"part-00000\"\n# or you can do more--\n# ten parts: part-0000*\n# or 10% of parts: part-*0\n# or all parts: part-*\npath_to_all = \"s3n://\"+outBucketName+pathToOutput+\"part-*\"\nf = sc.sequenceFile(path_to_all)\nload_all = f.mapValues(json.loads)", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"execution_count": 84, "ce |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from collections import defaultdict | |
def get_overlap(pair, v2_extractor=None, v4_extractor=None): | |
v2_blobs = pair['v2'].get('data', {}).get('days', {}) # {'YYYY-MM-DD': dict} | |
v4_blobs = pair['v4'] # [{'creationDate': 'YYYY-MM-DD:...', 'k': val, ...}, ...] | |
# One blob per date in v2, multiple per date in v4 | |
results = {'v2': {}, 'v4': defaultdict(list)} | |
if not (v2_blobs and v4_blobs): | |
return results | |
v2_dates = v2_blobs.keys() |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
OlderNewer