Skip to content

Instantly share code, notes, and snippets.

@paulrougieux
Last active August 29, 2015 14:20
Show Gist options
  • Save paulrougieux/0786f9276683bdef5bc0 to your computer and use it in GitHub Desktop.
Save paulrougieux/0786f9276683bdef5bc0 to your computer and use it in GitHub Desktop.
Display SQL statement from dplyr chained operations
library(dplyr)
library(nycflights13)
# Create a SQLite databse
my_db <- src_sqlite("my_db.sqlite3", create = T)
flights_sqlite <- copy_to(my_db, flights,
temporary = FALSE,
indexes = list(c("year", "month", "day"),
"carrier", "tailnum"))
# Perform chained operations
flights_sqlite %>%
filter(year == 2013, month == 1, day == 1) %>%
select(year, month, day, carrier, dep_delay, air_time, distance) %>%
mutate(speed = distance / air_time * 60) %>%
arrange(year, month, day, carrier) %>%
explain
## <SQL>
## SELECT "year" AS "year", "month" AS "month", "day" AS "day", "carrier" AS "carrier", "dep_delay" AS "dep_delay", "air_time" AS "air_time", "distance" AS "distance", "distance" / "air_time" * 60.0 AS "speed"
## FROM "flights"
## WHERE "year" = 2013.0 AND "month" = 1.0 AND "day" = 1.0
## ORDER BY "year", "month", "day", "carrier"
##
##
## <PLAN>
## selectid order from
## 1 0 0 0
## 2 0 0 0
## detail
## 1 SEARCH TABLE flights USING INDEX flights_year_month_day (year=? AND month=? AND day=?)
## 2 USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment