This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// spark2-shell --jars /srv/deployment/analytics/refinery/artifacts/refinery-job.jar | |
/** | |
* Use RefineTarget.find to find all Refine targets for an input (camus job) in the last N hours. | |
* Then filter for any for which the _REFINED_FAILED flag exists. | |
*/ | |
import import org.apache.hadoop.fs.Path | |
import org.joda.time.format.DateTimeFormatter | |
import com.github.nscala_time.time.Imports._ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
export SPARK_HOME="${SPARK_HOME:-/usr/lib/spark2}" | |
export SPARK_CONF_DIR="${SPARK_CONF_DIR:-"${SPARK_HOME}"/conf}" | |
source ${SPARK_HOME}/bin/load-spark-env.sh | |
export HIVE_CONF_DIR=${SPARK_CONF_DIR} | |
export HADOOP_CONF_DIR=/etc/hadoop/conf | |
AMMONITE=~/bin/amm # This is amm binary release 2.11-1.6.7 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import org.apache.spark.sql._ | |
import org.apache.spark.sql.types._ | |
import org.apache.spark.sql.functions._ | |
// We need this to convert the out of order new schema to the new hive table schema. | |
// This also is used to drop columns that aren't in the new hive table schema. | |
import org.wikimedia.analytics.refinery.spark.sql.HiveExtensions._ | |
// Get the new desired field schemas | |
val mediawiki_revision_score_2 = spark.table("event.mediawiki_revision_score") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
== What onboarding projects are options? What about mixing some research questions with the task of processing data? (for example, find patterns of those who open an account on Wikipedia) @Joseph | |
* wikidump text analysis? | |
** category analysis? | |
Take Tiziano's code and use hadoop instead of wikidump text. | |
(1st, 2nd) ** historical redirect analysis, add to mediawiki_history (very useful for Analytics) | |
Please see: https://phabricator.wikimedia.org/T232123 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# From stat1004: | |
# pyspark2 --jars ~otto/spark-sql-kafka-0-10_2.11-2.3.1.jar,~otto/kafka-clients-1.1.0.jar | |
# Need spark-sql-kafka for DataStream source and kafka-clients for Kafka serdes. | |
from pyspark.sql.functions import * | |
from pyspark.sql.types import * | |
# Declare a Spark schema that matches the JSONData. | |
# In a future MEP world this would be automatically loaded | |
# from a JSONSchema. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function generateSchemaTests(title, majorVersion, schemaInfos) { | |
it(`All ${title} schemas should have title ${title} `, function() { | |
schemaInfos.forEach((info) => { | |
assert.equal(info.schema.title, title); | |
}); | |
}); | |
it(`All ${title} major version ${majorVersion} schemas should be ${majorVersion}.x.y`, function() { | |
schemaInfos.forEach((info) => { | |
assert.equal(semver.coerce(_.get(info.schema, '$id')).major, majorVersion); | |
}); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
title: mediawiki/revision/score | |
description: Represents a MW Revision Score event (from ORES). | |
$id: /mediawiki/revision/score/1.0.0 | |
$schema: https://json-schema.org/draft-07/schema# | |
type: object | |
allOf: | |
### revision-score does not include all revision/common fields, so we | |
### don't include revision/commmon schema, and instead specifically list | |
### the ones we need. | |
- $ref: /mediawiki/common/1.0.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Stop your Jupyter Notebook server from the JupyterHub UI. | |
# Move your old venv out of the way (or just delete it) | |
mv $HOME/venv $HOME/venv-old-$().$(date +%s) | |
# create a new empty venv | |
python3 -m venv --system-site-packages $HOME/venv | |
# Reinstall the jupyter venv | |
cd /srv/jupyterhub/deploy | |
$HOME/venv/bin/pip install --upgrade --no-index --force-reinstall --find-links=/srv/jupyterhub/deploy/artifacts/stretch/wheels --requirement=/srv/jupyterhub/deploy/frozen-requirements.txt |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You are given a very very large list of unsorted integers. These | |
integers are supposed to be unique and, if sorted, contiguous. However, you | |
suspect that this is not the case, so you want to write code to check for | |
missing or duplicate integers. Write code to return these results: | |
- Are there any missing or duplicate integers? | |
- How many missing integers? | |
- How many duplicate integers? | |
- Which integers are missing? | |
- Which integers are duplicates, and how many duplicates of each |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
title: mediawiki/page/links-change | |
description: Represents a MW Page Links Change event. | |
$id: /mediawiki/page/links-change/1.1.0 | |
$schema: 'https://json-schema.org/draft-07/schema#' | |
type: object | |
required: | |
- $schema | |
- meta | |
- page_id | |
- page_is_redirect |