Skip to content

Instantly share code, notes, and snippets.

@thattommyhall
Created November 19, 2010 19:24
Show Gist options
  • Save thattommyhall/706994 to your computer and use it in GitHub Desktop.
Save thattommyhall/706994 to your computer and use it in GitHub Desktop.
FULL OF WIN
> map <- expression({
+ lapply(seq_along(map.values),function(r){
+ x <- runif(map.values[[r]])
+ rhcollect(map.keys[[r]],c(n=map.values[[r]],mean=mean(x),sd=sd(x)))
+ })
+ })
>
> z <- rhmr(map, ofolder="/tmp/test", inout=c('lapply','sequence'),
+ N=10,mapred=list(mapred.reduce.tasks=0),jobname='test')
Error: could not find function "rhmr"
> library(Rhipe)
RHIPE: Cleaning up associated server (PID=16925)
> rhwrite(list(1,2,3),"/tmp/x",1)
Wrote 3 pairs occupying 57 bytes
[1] TRUE
> rhread("/tmp/x")
RHIPE: Read 5 pairs occupying 95 bytes, deserializing
[[1]]
[[1]][[1]]
[1] "1"
[[1]][[2]]
[1] 1
[[2]]
[[2]][[1]]
[1] "2"
[[2]][[2]]
[1] 2
[[3]]
[[3]][[1]]
[1] "3"
[[3]][[2]]
[1] 3
[[4]]
[[4]][[1]]
[1] "2"
[[4]][[2]]
[1] 2
[[5]]
[[5]][[1]]
[1] "3"
[[5]][[2]]
[1] 3
> map <- expression({
+ lapply(seq_along(map.values),function(r){
+ x <- runif(map.values[[r]])
+ rhcollect(map.keys[[r]],c(n=map.values[[r]],mean=mean(x),sd=sd(x)))
+ })
+ })
> z <- rhmr(map, ofolder="/tmp/test", inout=c('lapply','sequence'),
+ N=10,mapred=list(mapred.reduce.tasks=0),jobname='test')
> rhex(z)
----------------------------------------
Running: $HADOOP/bin/hadoop jar /usr/local/lib/R/site-library/Rhipe/java/Rhipe.jar org.godhuli.rhipe.RHMR /tmp/Rtmpben7g6/rhipe2ed99297
----------------------------------------
10/11/19 19:22:26 INFO rhipe.RHMR: Tracking URL ----> http://master.hadoop.forward.co.uk:50030/jobdetails.jsp?jobid=job_201011110039_29400
10/11/19 19:22:26 INFO mapred.JobClient: Running job: job_201011110039_29400
10/11/19 19:22:27 INFO mapred.JobClient: map 0% reduce 0%
10/11/19 19:22:38 INFO mapred.JobClient: map 100% reduce 0%
10/11/19 19:22:40 INFO mapred.JobClient: Job complete: job_201011110039_29400
10/11/19 19:22:40 INFO mapred.JobClient: Counters: 11
10/11/19 19:22:40 INFO mapred.JobClient: Job Counters
10/11/19 19:22:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9799
10/11/19 19:22:40 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
10/11/19 19:22:40 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
10/11/19 19:22:40 INFO mapred.JobClient: Launched map tasks=2
10/11/19 19:22:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
10/11/19 19:22:40 INFO mapred.JobClient: FileSystemCounters
10/11/19 19:22:40 INFO mapred.JobClient: HDFS_BYTES_READ=138
10/11/19 19:22:40 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=988
10/11/19 19:22:40 INFO mapred.JobClient: Map-Reduce Framework
10/11/19 19:22:40 INFO mapred.JobClient: Map input records=10
10/11/19 19:22:40 INFO mapred.JobClient: Spilled Records=0
10/11/19 19:22:40 INFO mapred.JobClient: Map output records=10
10/11/19 19:22:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=138
result:256
$state
[1] TRUE
$counters
$counters$FileSystemCounters
HDFS_BYTES_READ HDFS_BYTES_WRITTEN
138 988
$counters$`Job Counters `
Total time spent by all maps waiting after reserving slots (ms)
0
Total time spent by all reduces waiting after reserving slots (ms)
0
SLOTS_MILLIS_MAPS
9799
SLOTS_MILLIS_REDUCES
0
Launched map tasks
2
$counters$`Map-Reduce Framework`
Map input records Map output records Spilled Records SPLIT_RAW_BYTES
10 10 0 138
$counters$job_time
[1] 14.076
> res <- rhread('/tmp/test/p*')
RHIPE: Read 10 pairs occupying 700 bytes, deserializing
> colres <- do.call('rbind', lapply(res,"[[",2))
> colres
n mean sd
[1,] 1 0.1205761 NA
[2,] 2 0.1635505 0.1230097
[3,] 3 0.7054041 0.4043578
[4,] 4 0.6265748 0.3632526
[5,] 5 0.6605429 0.3868235
[6,] 6 0.5150720 0.2967631
[7,] 7 0.5233840 0.2131576
[8,] 8 0.5925578 0.2602574
[9,] 9 0.6118205 0.3150155
[10,] 10 0.7093372 0.2202394
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment