Created
November 10, 2011 00:14
-
-
Save etorreborre/1353640 to your computer and use it in GitHub Desktop.
Using different Monoids to extract the max value / max cumulated value on some records, for a given key
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* The methods below can be used to extract significant values in log records. For example, we might want to get: | |
* | |
* - the maximum execution time per method call | |
* - the maximum execution time per server | |
* - the maximum memory consumption per method call | |
* - the maximum memory consumption per server | |
* | |
* - the maximum cumulated execution time per method call | |
* - the maximum cumulated execution time per server | |
* - the maximum cumulated memory consumption per method call | |
* - the maximum cumulated memory consumption per server | |
* | |
* To do that, we just need to provide: | |
* | |
* - a key: T => K (per method call, per server) | |
* - a Monoid[T] (to get a measure on elements of type T: execution time, memory,...) | |
* - an Ordering[T] (to get the max, but it could be the min) | |
*/ | |
/** | |
* @return a seq of records where the measure, provided on the Monoid on T is cumulated for each key K | |
*/ | |
def reduceByKey[T : Monoid, K](records: Seq[T], key: T => K): Seq[T] = | |
records.foldMap(r => Map(key(r) -> r)).map(_._2) | |
/** | |
* @return a seq of records where we keep only the max for each key | |
* | |
* this function does a reduceByKey but using a "max Monoid" as a way to reduce the value | |
*/ | |
def maxByKey[T : Monoid : Ordered , K](records: Seq[T], key: T => K): Seq[T] = | |
reduceByKey(records, key)(maxIsMonoid) | |
/** | |
* @return a seq of records where we keep only the max cumulated value for each key | |
* | |
* this function does a first reduceByKey to accumulate values for a given key, according to the Monoid for T | |
* then it uses reduceByKey with a "max Monoid" to keep only the maximums | |
*/ | |
def reduceMaxByKey[T : Monoid : Ordering, K](records: Seq[T], key: T => K): Seq[T] = | |
reduceByKey(reduceByKey(records, key), key)(maxIsMonoid) | |
/** | |
* SUPPORT METHODS TO GET A MONOID FROM AN ORDERING | |
*/ | |
/** | |
* if there's an Ordering defined on T, we can derive a Semigroup for T elements so that | |
* "adding" 2 elements only keeps the maximum | |
*/ | |
def maxIsSemigroup[T : Ordering] = new Semigroup[T] { | |
def append(t1: T, t2 : =>T): T = implicitly[Ordering[T]].max(t1, t2) | |
} | |
/** | |
* @return a Monoid from a Zero and a Semigroup | |
*/ | |
def zeroAndSemigroupIsMonoid[T : Semigroup](z: T) = new Monoid[T] { | |
val zero = z | |
def append(t1: T, t2 : =>T): T = implicitly[Semigroup[T]].append(t1, t2) | |
} | |
/** | |
* provided that we have an Ordering on elements of type T, and if T has a Zero, then we can | |
* define a Monoid on elements of type T so that only the max element is kept after an "addition" | |
*/ | |
def maxIsMonoid[T : Zero : Ordering] = { | |
zeroAndSemigroupIsMonoid(implicitly[Zero[T]].zero)(maxIsSemigroup(implicitly[Ordering[T]])) | |
} | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment