Suppose you have a key like (page, geo, day) and you want to make rollups/datacube so you can query for all pages, or all geos or all days.
Here is how you do it:
def opts[T](t: T): Seq[Option[T]] = Seq(Some(t), None)
val p: TypedPipe[(String, String, Int)] = ...
p.sumByLocalKeys
.flatMap { case ((page, geo, day), v) =>
for {
op <- opts(page)
og <- opts(geo)
od <- opts(day)
} yield ((op, og, od), v)
}
.group
.sum
The v can be a tuple of integers, doubles, the value 1L (to count), HyperLogLogs, whatever your monoidal heart desires.
The sumByLocalKeys is optional, but it aggregates the values on the mappers before expanding the keys, which can be a big performance win (especially if done before).
TODO: it would be great to have a macro in scala 2.10 that does this for you. It has to be a macro (I think) since it is dynamic on the arity of the Tuple.
Would it be possible to get all rollups just as easily (similar to how pig does it with ROLLUP)? Ex: page & day for all geos?