nuria’s gists

nuria / spark-shell shothand

Created January 21, 2020 20:31

spark-shell

spark2-shell --master yarn --executor-memory 8G --executor-cores 4 --driver-memory 16G --conf spark.dynamicAllocation.maxExecutors=64 --conf spark.executor.memoryOverhead=2048 --jars /srv/deployment/analytics/refinery/artifacts/refinery-job.jar,/srv/deployment/analytics/refinery/artifacts/refinery-hive.jar

nuria / hadoop-oozie-workflow

Last active December 11, 2019 21:58

	Three levels job->attemp->application to reach cassandra
	The cassandra jar newly build - with exclusions - is on /tmp/oozie-nuria


	hdfs dfs -rmr /tmp/oozie-nuria ; hdfs dfs -mkdir /tmp/oozie-nuria; hdfs dfs -put oozie/* /tmp/oozie-nuria;



	Start job:

nuria / FailedRefineTargets.scala

Created December 9, 2019 16:53

	// spark2-shell --jars /srv/deployment/analytics/refinery/artifacts/refinery-job.jar

	/**
	* Use RefineTarget.find to find all Refine targets for an input (camus job) in the last N hours.
	* Then filter for any for which the _REFINED_FAILED flag exists.
	*/

	import import org.apache.hadoop.fs.Path
	import org.joda.time.format.DateTimeFormatter
	import com.github.nscala_time.time.Imports._

nuria / event-streams.js

Last active November 26, 2019 19:27

Event Streams consumption

	// This is the EventStreams RecentChange stream endpoint
	var url = 'https://stream.wikimedia.org/v2/stream/recentchange';

	// Use EventSource (available in most browsers, or as an
	// npm module: https://www.npmjs.com/package/eventsource)
	// to subscribe to the stream.
	var recentChangeStream = new EventSource(url);

	// Print each event to the console
	recentChangeStream.onmessage = function(message) {

nuria / test_segment_metadata.sh

Last active October 2, 2019 21:59

test_segment_metadata.sh

	curl -X POST 'http://localhost:8082/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d '{
	"queryType":"segmentMetadata",
	"dataSource":"wmf_netflow",
	"intervals":["2019-09-01/2019-10-01"]

	}'

nuria / redirects.ipynb

Created September 20, 2019 14:52

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

nuria / spark-streaming.py

Created September 11, 2019 16:55

	# From stat1004:
	# pyspark2 --jars ~otto/spark-sql-kafka-0-10_2.11-2.3.1.jar,~otto/kafka-clients-1.1.0.jar
	# Need spark-sql-kafka for DataStream source and kafka-clients for Kafka serdes.

	from pyspark.sql.functions import *
	from pyspark.sql.types import *

	# Declare a Spark schema that matches the JSONData.
	# In a future MEP world this would be automatically loaded
	# from a JSONSchema.

nuria / select-events-per-day.hql

Created July 22, 2019 20:17

select events per day

	select CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) AS date,
	count(1) as n_events
	from event.externalguidance
	where year=2019 and month=6
	and not useragent.is_bot
	and event.action = 'init'
	group by year, month, day
	order by date
	limit 1000000

nuria / calculate_entropy.py

Created February 12, 2019 04:58

Calculate entropy

	#!/usr/bin/python

	import sys
	import math

	f = sys.argv[1]

	_file = open(f)

	data = {}

nuria / .virmrc

Last active September 17, 2019 21:31

.vimrc

	set number
	syntax enable
	set cursorline
	set showcmd
	'show invisible chars'
	set listchars=tab:→\ ,space:·,nbsp:␣,trail:•,eol:¶,precedes:«,extends:»
	set list

nuria nuria