Created
April 12, 2017 05:57
-
-
Save EconomiCurtis/25165411209127f3f5a12d7d237f5889 to your computer and use it in GitHub Desktop.
Morton Experimental PoliSci R Assignment Solution
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
install.packages('nycflights13') | |
library(nycflights13) | |
library(dplyr) | |
dim(flights) | |
flights | |
df = flights | |
# Curtis Kephart | |
# [email protected] | |
# Questions and solution for Prof Morton's Experimental PoliSci R assignment. | |
################################################################## | |
# 1) What is the tail number of that fastest plane on Dec 1 2013? | |
# Hint, check out the `speed` variable in the `mutate()` section, and also `filter()` and `arrange()` | |
df = flights %>% | |
mutate( | |
gain = arr_delay - dep_delay, | |
speed = distance / air_time * 60 | |
) %>% | |
filter( | |
year == 2013, | |
month == 12, | |
day == 1 | |
) %>% | |
arrange(-speed) #if a number, the '-' minus sign makes the sort descending | |
# N593JB | |
################################################################## | |
#' 2) In this dataset, of the carriers that flew out of JFK airport, which five carriers flew the most? | |
#' Just supply the 2 character code of the carrier. | |
df = flights %>% | |
filter( | |
origin == 'JFK' | |
) %>% | |
group_by(carrier) %>% | |
summarise( | |
num_flights = n() | |
) %>% | |
arrange(desc(num_flights)) | |
#' # A tibble: 10 × 2 | |
#' carrier num_flights | |
#' <chr> <int> | |
#' 1 B6 42076 | |
#' 2 DL 20701 | |
#' 3 9E 14651 | |
#' 4 AA 13783 | |
#' 5 MQ 7193 | |
################################################################## | |
#' 4) Of the same route and carriers in 3), which carrier tended to fly the fastest? | |
#' Hint: one way to solve this is with mutate | |
#' (and see the `speed` discussion in the dplyr vignette), | |
#' plus the filter, group_by, and summarize steps taken in question 3. | |
#' Delete any flights that give an NA for speed. | |
df = flights %>% | |
mutate( | |
speed = distance / air_time * 60 | |
) %>% | |
filter( | |
origin == 'JFK', | |
dest == 'SFO', | |
!is.na(speed) | |
) %>% | |
group_by(carrier) %>% | |
summarise( | |
n = n(), | |
speed_mean = mean(speed) | |
) | |
# # A tibble: 5 × 3 | |
# carrier n speed_mean | |
# <chr> <int> <dbl> | |
# 1 AA 1398 446.3310 | |
# 2 B6 1020 447.9720 | |
# 3 DL 1848 448.3237 | |
# 4 UA 2441 449.7024 ## most and fastest | |
# 5 VX 1402 444.3888 | |
#' 3) Of flights from JFK to SFO, which carrier flew the most flights? | |
#' 4) And which carrier tended to make that flight the fastest? | |
#' UA for both |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment