Skip to content

Instantly share code, notes, and snippets.

View zero323's full-sized avatar
doing something weird

zero323

doing something weird
View GitHub Profile
@zero323
zero323 / 01_current.py
Created October 1, 2019 16:06
Generating ML setters and getters
from pyspark.ml.param import Param, Params, TypeConverters
from pyspark import keyword_only
class FooBar(Params):
foo = Param(
Params._dummy(),
"foo", "Just foo",
typeConverter=TypeConverters.toInt)
@zero323
zero323 / keybase.md
Created July 10, 2019 12:19
keybase.md

Keybase proof

I hereby claim:

  • I am zero323 on github.
  • I am zero323 (https://keybase.io/zero323) on keybase.
  • I have a public key whose fingerprint is 642B E92F A739 1BF6 B9E9 2A64 C095 AA7F 33E6 123A

To claim this, I am signing this object:

@zero323
zero323 / app.R
Last active December 20, 2018 19:25
Non-centrality behavior
library(shiny)
library(dplyr)
library(ggplot2)
library(data.table)
library(tidyr)
library(plotly)
ncp <- function(K, OR, N, R2) {
b_MR <- K * (OR / (1 + K * (OR - 1)) - 1)

PySpark UDF improvements proposal

UDF creation

Current state

Right now there are a few ways we can create UDF:

  • With standalone function:
import java.util.*;
import scala.Tuple2;
import scala.Tuple3;
import org.apache.spark.util.StatCounter;
import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
public class App {
public static void main(String[] args) {
SparkConf conf = new SparkConf().set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
scalaVersion := "2.11.7"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.1",
"org.apache.spark" %% "spark-sql" % "1.6.1"
)
from pyspark.sql.window import Window
from pyspark.sql import functions as f
df = sqlContext.createDataFrame(
zip(["foo"] * 5 + ["bar"] * 5, range(1, 6) + range(6, 11)),
("k", "v")
).withColumn("dummy", f.lit(1))
df.registerTempTable("df")
(import org.apache.spark.sql.Row$)
(import scala.collection.JavaConversions)
(import org.apache.spark.sql.RowFactory)
(defn vec->row1 [v]
(let [s (-> v JavaConversions/asScalaBuffer .toSeq)]
(.fromSeq Row$/MODULE$ s)))
(defn vec->row2 [v]
(RowFactory/create (into-array Object v)))
from numpy import floor
def quantile(rdd, p, sample=None, seed=0):
"""Compute a percentile of order p ∈ [0, 1]
:rdd a numeric rdd
:p percentile (between 0 and 1)
:sample fraction of and rdd to use. If not provided we use a whole dataset
:seed random number generator seed to be used with sample
"""
assert 0 <= p <= 1
@zero323
zero323 / width.R
Last active August 29, 2015 14:18
library(shiny)
predict <- function(foo_or_bar) { ifelse(foo_or_bar, 'Foo', 'Bar') }
style <- "
/*
Every textInput is wrapped with div with following classes
Maximum width is by defult 595px. We can change it to 100%
of the parent element
*/