Skip to content

Instantly share code, notes, and snippets.

@mannharleen
Created September 2, 2017 09:26
Show Gist options
  • Save mannharleen/7b6d74f384eddcc58897d1b002233f9e to your computer and use it in GitHub Desktop.
Save mannharleen/7b6d74f384eddcc58897d1b002233f9e to your computer and use it in GitHub Desktop.
Spark dataframe cube basics
val rdd1 = sc.parallelize(List((1,"one"),(2,"two")))
val df1 = rdd1.toDF("col1","col2")
//user reflection to convert to DF
//create cube with dimentions as col1 and col2 & fact as average of col1
df1.cube("col1","col2").agg( Map( "col1" -> "avg" )).show
/*
Outputs:
+----+----+---------+
|col1|col2|avg(col1)|
+----+----+---------+
|null| two| 2.0|
| 2|null| 2.0|
| 1|null| 1.0|
| 1| one| 1.0|
|null|null| 1.5|
|null| one| 1.0|
| 2| two| 2.0|
+----+----+---------+
*/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment