Skip to content

Instantly share code, notes, and snippets.

@EconomiCurtis
Created April 12, 2017 05:57
Show Gist options
  • Save EconomiCurtis/25165411209127f3f5a12d7d237f5889 to your computer and use it in GitHub Desktop.
Save EconomiCurtis/25165411209127f3f5a12d7d237f5889 to your computer and use it in GitHub Desktop.
Morton Experimental PoliSci R Assignment Solution
install.packages('nycflights13')
library(nycflights13)
library(dplyr)
dim(flights)
flights
df = flights
# Curtis Kephart
# [email protected]
# Questions and solution for Prof Morton's Experimental PoliSci R assignment.
##################################################################
# 1) What is the tail number of that fastest plane on Dec 1 2013?
# Hint, check out the `speed` variable in the `mutate()` section, and also `filter()` and `arrange()`
df = flights %>%
mutate(
gain = arr_delay - dep_delay,
speed = distance / air_time * 60
) %>%
filter(
year == 2013,
month == 12,
day == 1
) %>%
arrange(-speed) #if a number, the '-' minus sign makes the sort descending
# N593JB
##################################################################
#' 2) In this dataset, of the carriers that flew out of JFK airport, which five carriers flew the most?
#' Just supply the 2 character code of the carrier.
df = flights %>%
filter(
origin == 'JFK'
) %>%
group_by(carrier) %>%
summarise(
num_flights = n()
) %>%
arrange(desc(num_flights))
#' # A tibble: 10 × 2
#' carrier num_flights
#' <chr> <int>
#' 1 B6 42076
#' 2 DL 20701
#' 3 9E 14651
#' 4 AA 13783
#' 5 MQ 7193
##################################################################
#' 4) Of the same route and carriers in 3), which carrier tended to fly the fastest?
#' Hint: one way to solve this is with mutate
#' (and see the `speed` discussion in the dplyr vignette),
#' plus the filter, group_by, and summarize steps taken in question 3.
#' Delete any flights that give an NA for speed.
df = flights %>%
mutate(
speed = distance / air_time * 60
) %>%
filter(
origin == 'JFK',
dest == 'SFO',
!is.na(speed)
) %>%
group_by(carrier) %>%
summarise(
n = n(),
speed_mean = mean(speed)
)
# # A tibble: 5 × 3
# carrier n speed_mean
# <chr> <int> <dbl>
# 1 AA 1398 446.3310
# 2 B6 1020 447.9720
# 3 DL 1848 448.3237
# 4 UA 2441 449.7024 ## most and fastest
# 5 VX 1402 444.3888
#' 3) Of flights from JFK to SFO, which carrier flew the most flights?
#' 4) And which carrier tended to make that flight the fastest?
#' UA for both
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment