Created
December 19, 2015 09:34
-
-
Save anirudhjayaraman/2555eac5d30ba45f2207 to your computer and use it in GitHub Desktop.
group_by illustrative examples
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# group_by() ------------------------------------------------------------------------- | |
# Generate a per-carrier summary of hflights with the following variables: n_flights, | |
# the number of flights flown by the carrier; n_canc, the number of cancelled flights; | |
# p_canc, the percentage of cancelled flights; avg_delay, the average arrival delay of | |
# flights whose delay does not equal NA. Next, order the carriers in the summary from | |
# low to high by their average arrival delay. Use percentage of flights cancelled to | |
# break any ties. Which airline scores best based on these statistics? | |
hflights %>% | |
group_by(UniqueCarrier) %>% | |
summarise(n_flights = n(), n_canc = sum(Cancelled), p_canc = 100*n_canc/n_flights, | |
avg_delay = mean(ArrDelay, na.rm = TRUE)) %>% arrange(avg_delay) | |
# Generate a per-day-of-week summary of hflights with the variable avg_taxi, | |
# the average total taxiing time. Pipe this summary into an arrange() call such | |
# that the day with the highest avg_taxi comes first. | |
hflights %>% | |
group_by(DayOfWeek) %>% | |
summarize(avg_taxi = mean(TaxiIn + TaxiOut, na.rm = TRUE)) %>% | |
arrange(desc(avg_taxi)) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment