Skip to content

Instantly share code, notes, and snippets.

View saptarshiguha's full-sized avatar

Saptarshi Guha saptarshiguha

View GitHub Profile
@saptarshiguha
saptarshiguha / google_login.py
Created September 15, 2017 10:58 — forked from tomchuk/google_login.py
Google OAuth2 Authentication with Flask & requests-oauthlib
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
A simple flask app to authenticate with Google's OAuth 2.0 API
Requirements:
Flask>=0.10.0
requests-oauthlib>=0.5.0
To install, run: "pip install Flask>=0.10.0 requests-oauthlib>=0.5.0"

Sampling Crash Volumes, Rates and Rarity for Socorro Samples

Introduction

The Socorro crash report accumulation pipeline does not process all the crash reports. Though every report is stored on disk, only 10% are processed and saved in HBase as JSON objects. Each crash report has a crash signature (Crash Report Signature or CRS for short). The relationship between crash reports and CRSs is many to one.

with
base as (
select
client_id,
submission_date_s3,
profile_creation_date,
experiments,
subsession_length,
active_ticks,
search_counts,
with
base as (
select
client_id,
submission_date_s3,
profile_creation_date,
experiments,
subsession_length,
active_ticks,
search_counts,
################################################################################
## PySpark Invocation
## submit code using /usr/lib/spark/bin/spark-submit review.py
################################################################################
import pyspark
import py4j
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = pyspark.SparkContext()
sqlContext = SQLContext(sc)
import json
import random
import subprocess
import time
import pandas as pd
useALL = False
ms = sqlContext.read.load("s3://telemetry-parquet/main_summary/v4", "parquet",mergeSchema=True)
import sys
import datetime
import random
import subprocess
def unix_time_sec(dt):
epoch =datetime.datetime.strptime("1970-01-01", "%Y-%m-%d").date()
return int((dt - epoch).total_seconds())
ms = sqlContext.read.load("s3://telemetry-parquet/main_summary/v4", "parquet"
## We need DAU for as far back we can go
## need pyspark!!
import sys
import datetime
import json
import random
import subprocess
import time
import pandas as pd
################################################################################
## PySpark Invocation
## submit code using /usr/lib/spark/bin/spark-submit review.py
################################################################################
import os,sys
print([os.environ.get('PYSPARK_PYTHON','missing'),os.environ.get('PYSPARK_DRIVER_PYTHON','missing')])
import pyspark
import py4j
from pyspark import SparkContext
from pyspark.sql import SQLContext
import sys
import datetime
import random
import subprocess
import mozillametricstools.common.functions as mozfun
# "active_addons"
mozfun.register_udf(sqlContext
, lambda arr: sum(arr) if arr else 0, "array_sum"
, pyspark.sql.types.IntegerType())