Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save erikerlandson/601a21bf50b4847314cf4d76343af699 to your computer and use it in GitHub Desktop.
Save erikerlandson/601a21bf50b4847314cf4d76343af699 to your computer and use it in GitHub Desktop.
[eje@linux spark]$ ./bin/pyspark \
--packages "org.isarnproject:isarn-sketches-spark_2.11:0.1.0.py2" \
--repositories "https://dl.bintray.com/isarn/maven/"
# a bunch of maven loading log output ...
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Python version 2.7.13 (default, May 10 2017 20:04:28)
SparkSession available as 'spark'.
>>> from isarnproject.sketches.udaf.tdigest import *
>>> from pyspark.sql import Row
>>> row = Row("x1","x2")
>>> df = spark.sparkContext.parallelize([row(1.0, 2.0),row(1.5,2.5),row(2.0,3.0)]).toDF()
>>> df.agg(tdigest("x1")).show()
+----------------------+
|tdigestdoubleudaf$(x1)|
+----------------------+
| TDigestSQL(TDiges...|
+----------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment