Skip to content

Instantly share code, notes, and snippets.

View iangow's full-sized avatar
🏠
Working from home

Ian Gow iangow

🏠
Working from home
View GitHub Profile
@iangow
iangow / camp_participants.R
Created January 3, 2024 15:59
Code to scrape data from a PDF
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
library(readr) # For read_lines(), read_fwf(), etc.
library(stringr) # For str_c(), str_detect()
library(pdftools) # For pdf_text()
library(lubridate) # For ymd()
library(ggplot2)
url <- paste0("https://aaahq.org/portals/0/documents/meetings/2023/RC/",
"2023%20Rookie%20Camp%20Alphabetical%20Presentation%20Schedule.pdf")
@iangow
iangow / penguins_ibis.ipynb
Created January 2, 2024 19:03
Tidy Data Manipulation: dplyr vs polars using Ibis
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@iangow
iangow / regr_slope.R
Created December 30, 2023 15:48
Code to generate fake return data and estimate rolling betas.
library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
library(DBI)
set.seed(2023)
n <- 100
betas <- tibble(id = 1:n, beta = runif(n, min = 0.7, max = 1.3))
start_date <- as.Date("1980-01-01")
end_date <- as.Date("2023-01-01")
@iangow
iangow / wrds_to_pq.ipynb
Created December 30, 2023 12:26
Notebook for photo-package for RAM-friendly DB-to-parquet conversion
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@iangow
iangow / test_read.ipynb
Created December 26, 2023 13:19
Code to test various ways of reading data into Pandas.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@iangow
iangow / newton_streets.R
Created December 17, 2023 21:04
Code to scrape data on Newton streets from a PDF.
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
library(readr) # For read_lines(), read_fwf(), etc.
library(stringr) # For str_c(), str_detect()
library(pdftools) # For pdf_text()
library(ggplot2)
url <- "https://www.newtonma.gov/home/showpublisheddocument/97990/638140435866000000"
col_names <- c("street_name", "length_mi", "length_ft",
@iangow
iangow / icc_calc.ipynb
Created December 6, 2023 13:37
Illustration of Python function used in DuckDB
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Benchmarking WRDS data retrieval

Summary

The following are some benchmarks that emerged naturally from some work I was doing to get Stata data from WRDS. In essence, there are two steps. First, retrieve data from WRDS's PostgreSQL database. Second, do some analysis of the retrieved data (here a simple summary query). Below I have tabulated the elapsed times in seconds for each of these two tasks for a number of different approaches.