Skip to content

Instantly share code, notes, and snippets.

// The LatencyStat object is a POJO with three fields: bucket, percentile and latency
// The goal is to get a new List<LatencyStat> with one object per (bucket, percentile) combination
// averaging the latency value between them.
Map<String, Map<Double, List<SearchLatencyProbe.LatencyStat>>> foo = nodes.stream()
.flatMap(n -> n.statDetails.latencies.stream())
.collect(Collectors.groupingBy(LatencyStat::getBucket,
Collectors.groupingBy(LatencyStat::getPercentile)));
List<LatencyStat> allNodes = new ArrayList<>();
for (Map.Entry<String, Map<Double, List<LatencyStat>>> bucketEntry : foo.entrySet()) {
for (Map.Entry<Double, List<LatencyStat>> percentileEntry : bucketEntry.getValue().entrySet()) {
import org.apache.spark.ml.linalg.{Vector, Vectors}
import org.apache.spark.ml.feature.LabeledPoint
import org.apache.spark.rdd.RDD
import scala.collection.mutable.ArrayBuffer
import scala.util.Random
def randomVec(r: Random, size: Int): Vector = {
val feats = for (i <- 0 to size) yield r.nextDouble
Vectors.dense(feats.toArray)
}
import argparse
import logging
import os
import re
from tempfile import TemporaryFile
import boto3
import botocore
@ebernhardson
ebernhardson / Dockerfile
Last active August 1, 2018 06:41
LightGBM + HDFS Demo
FROM docker-registry.wikimedia.org/wikimedia-jessie
ENTRYPOINT ["/bin/bash"]
COPY cloudera.list /etc/apt/sources.list.d/cloudera.list
COPY cloudera.pref /etc/apt/preferences.d/cloudera.pref
COPY archive.key /root/archive.key
ENV HADDOP_CONF=/etc/hadoop/conf
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ebernhardson
ebernhardson / mlr.puml
Created January 15, 2019 19:53
MLR Pipeline Sequence Diagram
@startuml
== click log generation ==
oozie -> oozie: schedule label generation
note left
arrow signify initiator
of communication, not
data flow
end note
@ebernhardson
ebernhardson / Tensorflow_on_SWAP.ipynb
Last active February 7, 2019 06:13
Tensorflow in SWAP
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
public Dataset<Row> buildPairsForM0Prep(Dataset<Row> df, Dataset<Row> dfOld, GlentParams params) {
dfOld = dfOld
.where(col("part").equalTo(params.glentDfM0PrepPartOld)) // limit to previous portion of M0Prep dataframe
.drop(col("part"));
Column oldTsCondition = null;
if (dfOld.isEmpty()) {
oldTsCondition = lit(true);
} else {
Row[] oldTsRows = dfOld.agg(max("q1_ts").alias("tsmax")).collect();
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ebernhardson
ebernhardson / Poorly_Performing_Queries.ipynb
Last active April 29, 2019 17:54
Poorly Performing Queries notebook
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.