Skip to content

Instantly share code, notes, and snippets.

View mwiewior's full-sized avatar

Marek Wiewiórka mwiewior

View GitHub Profile
@mwiewior
mwiewior / slack.sh
Created March 4, 2018 18:14 — forked from andkirby/slack.sh
Shell/Bash script for sending slack messages.
#!/usr/bin/env bash
####################################################################################
# Slack Bash console script for sending messages.
####################################################################################
# Installation
# $ curl -s https://gist.githubusercontent.com/andkirby/67a774513215d7ba06384186dd441d9e/raw --output /usr/bin/slack
# $ chmod +x /usr/bin/slack
####################################################################################
# USAGE
# Send message to slack channel/user
@mwiewior
mwiewior / extraStrategies.md
Created October 13, 2017 08:54 — forked from marmbrus/extraStrategies.md
Example of injecting custom planning strategies into Spark SQL.

First a disclaimer: This is an experimental API that exposes internals that are likely to change in between different Spark releases. As a result, most datasources should be written against the stable public API in org.apache.spark.sql.sources. We expose this mostly to get feedback on what optimizations we should add to the stable API in order to get the best performance out of data sources.

We'll start with a simple artificial data source that just returns ranges of consecutive integers.

/** A data source that returns ranges of consecutive integers in a column named `a`. */
case class SimpleRelation(
    start: Int, 
    end: Int)(
    @transient val sqlContext: SQLContext) 
@mwiewior
mwiewior / extraStrategies.md
Created October 13, 2017 08:54 — forked from marmbrus/extraStrategies.md
Example of injecting custom planning strategies into Spark SQL.

First a disclaimer: This is an experimental API that exposes internals that are likely to change in between different Spark releases. As a result, most datasources should be written against the stable public API in org.apache.spark.sql.sources. We expose this mostly to get feedback on what optimizations we should add to the stable API in order to get the best performance out of data sources.

We'll start with a simple artificial data source that just returns ranges of consecutive integers.

/** A data source that returns ranges of consecutive integers in a column named `a`. */
case class SimpleRelation(
    start: Int, 
    end: Int)(
    @transient val sqlContext: SQLContext) 
@mwiewior
mwiewior / gist:709a0f711425ff68cb534f9b34919fbe
Created September 11, 2017 15:34 — forked from mbedward/gist:6e3dbb232bafec0792ba
Scala macro to convert between a case class instance and a Map of constructor parameters. Developed by Jonathan Chow (see http://blog.echo.sh/post/65955606729/exploring-scala-macros-map-to-case-class-conversion for description and usage). This version simply updates Jonathan's code to Scala 2.11.2
import scala.language.experimental.macros
import scala.reflect.macros.blackbox.Context
trait Mappable[T] {
def toMap(t: T): Map[String, Any]
def fromMap(map: Map[String, Any]): T
}
object Mappable {
@mwiewior
mwiewior / gist:872259b8dc3860d0c5f554a6fb0099ed
Created September 11, 2017 15:34 — forked from mbedward/gist:6e3dbb232bafec0792ba
Scala macro to convert between a case class instance and a Map of constructor parameters. Developed by Jonathan Chow (see http://blog.echo.sh/post/65955606729/exploring-scala-macros-map-to-case-class-conversion for description and usage). This version simply updates Jonathan's code to Scala 2.11.2
import scala.language.experimental.macros
import scala.reflect.macros.blackbox.Context
trait Mappable[T] {
def toMap(t: T): Map[String, Any]
def fromMap(map: Map[String, Any]): T
}
object Mappable {
@mwiewior
mwiewior / Schema2CaseClass.scala
Created September 11, 2017 15:21 — forked from yoyama/Schema2CaseClass.scala
Generate case class from spark DataFrame/Dataset schema.
/**
* Generate Case class from DataFrame.schema
*
* val df:DataFrame = ...
*
* val s2cc = new Schema2CaseClass
* import s2cc.implicit._
*
* println(s2cc.schemaToCaseClass(df.schema, "MyClass"))
*
@mwiewior
mwiewior / HdfsSeekRead.java
Created August 30, 2017 09:56 — forked from t3rmin4t0r/HdfsSeekRead.java
hdfs seek benchmark
// import org.apache.commons.lang3.RandomUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import com.google.common.base.Stopwatch;
import java.io.IOException;
import sklearn
import numpy as np
import math
import pickle
import collections
class DGA:
def __init__(self):
self.model = { 'clf': pickle.loads(open('./dga_model_random_forest.model','rb').read())
, 'alexa_vc': pickle.loads(open('./dga_model_alexa_vectorizor.model','rb').read())
, 'alexa_counts': pickle.loads(open('./dga_model_alexa_counts.model','rb').read())
import sklearn
import numpy as np
import math
import pickle
import collections
class DGA:
def __init__(self):
self.model = { 'clf': pickle.loads(open('./dga_model_random_forest.model','rb').read())
, 'alexa_vc': pickle.loads(open('./dga_model_alexa_vectorizor.model','rb').read())
, 'alexa_counts': pickle.loads(open('./dga_model_alexa_counts.model','rb').read())

Phoenix/Spark demo

Option 1: prebuilt VM

There is a prebuilt Centos 6.5 VM with the below components installed:

  • HDP 2.3.0.0-1754
  • Spark 1.3.1