Skip to content

Instantly share code, notes, and snippets.

@yutannihilation
Created January 22, 2018 08:18
Show Gist options
  • Select an option

  • Save yutannihilation/540c599faa8a5369eafedf95f39516f7 to your computer and use it in GitHub Desktop.

Select an option

Save yutannihilation/540c599faa8a5369eafedf95f39516f7 to your computer and use it in GitHub Desktop.
Performance of some variants of group_by() + summarise()
reprex::reprex_info()
#> Created by the reprex package v0.1.1.9000 on 2018-01-22
library(dplyr, warn.conflicts = FALSE)
data("flights", package = "nycflights13")
microbenchmark::microbenchmark(
summarise_only = flights %>%
group_by(year, month, day, origin) %>%
summarise(dep_delay_avg = mean(dep_delay),
arr_delay_avg = mean(arr_delay),
total_delay_avg = dep_delay_avg + arr_delay_avg),
summarise_mutate = flights %>%
group_by(year, month, day, origin) %>%
summarise(dep_delay_avg = mean(dep_delay),
arr_delay_avg = mean(arr_delay)) %>%
mutate(total_delay_avg = dep_delay_avg + arr_delay_avg),
summarise_mutate_ungroup = flights %>%
group_by(year, month, day, origin) %>%
summarise(dep_delay_avg = mean(dep_delay),
arr_delay_avg = mean(arr_delay)) %>%
ungroup() %>%
mutate(total_delay_avg = dep_delay_avg + arr_delay_avg)
)
#> Unit: milliseconds
#> expr min lq mean median uq
#> summarise_only 68.69892 73.66465 83.79183 77.52717 83.25379
#> summarise_mutate 53.19234 55.92301 61.14978 58.69358 64.01526
#> summarise_mutate_ungroup 42.87608 45.61051 57.87704 48.22187 52.51579
#> max neval
#> 242.5709 100
#> 110.4376 100
#> 640.3410 100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment