Created
January 4, 2017 05:59
-
-
Save pavlov99/c62ae91b5637b77b118506fbaab3966b to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// NOTE: add minimum and maximum values to thresholds | |
val thresholds: Array[Double] = Array(Double.MinValue, 0.0) ++ (((0.0 until 50.0 by 10).toArray ++ Array(Double.MaxValue)).map(_.toDouble)) | |
// Convert DataFrame to RDD and calculate histogram values | |
val _tmpHist = df | |
.select($"column" cast "double") | |
.rdd.map(r => r.getDouble(0)) | |
.histogram(thresholds) | |
// Result DataFrame contains `from`, `to` range and the `value`. | |
val histogram = sc.parallelize((thresholds, thresholds.tail, _tmpHist).zipped.toList).toDF("from", "to", "value") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment