Created
September 2, 2017 09:26
-
-
Save mannharleen/7b6d74f384eddcc58897d1b002233f9e to your computer and use it in GitHub Desktop.
Spark dataframe cube basics
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val rdd1 = sc.parallelize(List((1,"one"),(2,"two"))) | |
val df1 = rdd1.toDF("col1","col2") | |
//user reflection to convert to DF | |
//create cube with dimentions as col1 and col2 & fact as average of col1 | |
df1.cube("col1","col2").agg( Map( "col1" -> "avg" )).show | |
/* | |
Outputs: | |
+----+----+---------+ | |
|col1|col2|avg(col1)| | |
+----+----+---------+ | |
|null| two| 2.0| | |
| 2|null| 2.0| | |
| 1|null| 1.0| | |
| 1| one| 1.0| | |
|null|null| 1.5| | |
|null| one| 1.0| | |
| 2| two| 2.0| | |
+----+----+---------+ | |
*/ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment