Skip to content

Instantly share code, notes, and snippets.

@szilard
Created November 8, 2015 04:57
Show Gist options
  • Save szilard/f272e871942196f89c43 to your computer and use it in GitHub Desktop.
Save szilard/f272e871942196f89c43 to your computer and use it in GitHub Desktop.
h2o group_by simple speed test
########### R
library(h2o)
h2oServer <- h2o.init(max_mem_size = "50g", nthreads = -1)
d <- h2o.importFile(h2oServer, path = "d.csv")
system.time({
print( h2o.group_by(d, "x", sum("y")) )
})
## user system elapsed
## 0.232 0.003 5.988
TODO: top 5 (order by, limit 5)
########### python
import h2o
h2o_server = h2o.init(max_mem_size_GB = 50)
d = h2o.import_file("d.csv")
%time d.group_by("x").sum("y").get_frame()
##CPU times: user 11 ms, sys: 4.21 ms, total: 15.2 ms
##Wall time: 5.27 s
TODO: top 5 (order by, limit 5)
########### setup
3.2.0.9
16 cores
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment