This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| -- make session dataset samller to be able to try things fast | |
| --create table session_tryouts as select * from classifier_data_sorted a where a.sessionId in (select distinct s.sessionId from classifier_data_sorted s limit 100); | |
| drop table if exists classifier_data_label; | |
| create table | |
| classifier_data_label | |
| as | |
| select | |
| sessionId, | |
| (unix_timestamp(max(ts)) - unix_timestamp( min(ts))) as length, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| spark-submit --class org.wikimedia.analytics.refinery.job.AppSessionMetrics --master yarn --num-executors=6 --executor-cores=2 --executor-memory=2g /mnt/hdfs/tmp/nuria/jars/refinery-job-0 | |
| .0.10-SNAPSHOT.jar hdfs://analytics-hadoop/tmp/mobile-apps-sessions 2015 03 10 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <html> | |
| <head> | |
| </head> | |
| <body> | |
| <script> | |
| function generateRandomWith5_16Bits() { | |
| var rnds, i, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import com.github.nscala_time.time.Imports.{LocalDate, Period} | |
| import com.twitter.algebird.{QTree, QTreeSemigroup} | |
| import org.apache.hadoop.fs.{FileSystem, Path} | |
| import org.apache.spark.rdd.RDD | |
| import org.apache.spark.sql.{DataFrame, SQLContext} | |
| import org.apache.spark.{SparkConf, SparkContext} | |
| import scopt.OptionParser | |
| import scala.collection.immutable.HashMap | |
| /** |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| use wmf; | |
| with hits as ( | |
| SELECT | |
| geocoded_data['country_code'] as country, | |
| geocoded_data['country'] country_name, | |
| SUM(CASE WHEN hostname NOT LIKE 'cp3%' AND hostname NOT LIKE 'amssq%' THEN 1 ELSE 0 END) AS hits_from_this_country_not_through_amsterdam, | |
| SUM(CASE WHEN hostname LIKE 'cp3%' OR hostname LIKE 'amssq%' THEN 1 ELSE 0 END) AS hits_from_this_country_from_amsterdam | |
| FROM wmf.webrequest | |
| WHERE TRUE |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "type" : "record", | |
| "name" : "AutoGeneratedSchema", | |
| "doc" : "Sqoop import of QueryResult", | |
| "fields" : [ { | |
| "name" : "id", | |
| "type" : [ "null", "int" ], | |
| "default" : null, | |
| "columnName" : "id", | |
| "sqlType" : "4" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| package org.wikimedia.analytics.refinery.tag; | |
| import com.google.common.collect.ImmutableSet; | |
| import com.google.common.reflect.ClassPath; | |
| import java.util.ArrayList; | |
| import java.util.List; | |
| import java.util.Set; | |
| import org.reflections.*; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/local/bin/python | |
| # unique devices variation study using daily data | |
| # we account for weekly variations | |
| # and try to see when the number of uniques | |
| # variates too much to be a quality meassurement | |
| # see: https://wikitech.wikimedia.org/w/index.php?title=Analytics/Data_Lake/Traffic/Unique_Devices/Last_access_solution | |
| from operator import itemgetter | |
| from datetime import datetime | |
| from datetime import date | |
| import numpy |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| SELECT | |
| month, | |
| day, | |
| SUM(CASE WHEN (user_agent LIKE '%iPhone%') THEN 1 ELSE 0 END) AS iphone, | |
| SUM(CASE WHEN (user_agent LIKE '%iOS%') THEN 1 ELSE 0 END) AS iOS | |
| FROM wmf.webrequest | |
| WHERE webrequest_source = 'text' | |
| AND year = 2016 | |
| AND month IN (9, 10) | |
| AND (user_agent like '%iOS%' OR user_agent like '%iPhone%') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| /** | |
| Parses a Json Object. | |
| The object will be traversed, and each leaf node of the object will | |
| be keyed by a concatenated key made up of all parent keys. | |
| **/ | |
| function MetricLogster(reporter) { | |
| } |