Skip to content

Instantly share code, notes, and snippets.

View kjhealy's full-sized avatar

Kieran Healy kjhealy

View GitHub Profile
@kjhealy
kjhealy / README.openai-structured-output-demo.md
Created November 19, 2024 20:58 — forked from dannguyen/README.openai-structured-output-demo.md
A basic test of OpenAI's Structured Output feature against financial disclosure reports and a newspaper's police blotter. Code examples use the Python SDK and pydantic for the schema definition.

Extracting financial disclosure reports and police blotter narratives using OpenAI's Structured Output

tl;dr this demo shows how to call OpenAI's gpt-4o-mini model, provide it with URL of a screenshot of a document, and extract data that follows a schema you define. The results are pretty solid even with little effort in defining the data — and no effort doing data prep. OpenAI's API could be a cost-efficient tool for large scale data gathering projects involving public documents.

OpenAI announced Structured Outputs for its API, a feature that allows users to specify the fields and schema of extracted data, and guarantees that the JSON output will follow that specification.

For example, given a Congressional financial disclosure report, with assets defined in a table like this:

snps <-
list(r = "~/.config/rstudio/snippets/r.snippets") %>%
purrr::map(readLines, warn = FALSE) %>%
purrr::map(paste, collapse = "\n") %>%
purrr::map(trimws) %>%
purrr::map(strsplit, split = "(^|\n)snippet ") %>%
purrr::map_depth(2, ~ .x[.x != ""]) %>%
purrr::map_depth(2, ~ {
nm <- gsub("^([^\n\t ]+).*", "\\1", .x)
names(.x) <- nm
@kjhealy
kjhealy / macos-tmux-256color.md
Created January 26, 2023 16:44 — forked from bbqtd/macos-tmux-256color.md
Installing tmux-256color for macOS

Installing tmux-256color for macOS

  • macOS 10.15.5
  • tmux 3.1b

macOS has ncurses version 5.7 which does not ship the terminfo description for tmux. There're two ways that can help you to solve this problem.

The Fast Blazing Solution

Instead of tmux-256color, use screen-256color which comes with system. Place this command into ~/.tmux.conf or ~/.config/tmux/tmux.conf(for version 3.1 and later):

@kjhealy
kjhealy / collapse_mask.R
Created February 15, 2022 03:13 — forked from grantmcdermott/collapse_mask.R
Benchmarking collapse_mask
## Context: https://twitter.com/grant_mcdermott/status/1493400952878952448
options(collapse_mask = "all") # NB: see `help('collapse-options')`
library(dplyr)
library(data.table)
library(collapse) # Needs to come after library(dplyr) for collapse_mask to work
flights = fread('https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv')
@kjhealy
kjhealy / ffmpeg.md
Created March 2, 2021 03:16 — forked from dvlden/ffmpeg.md
Convert video files to MP4 through FFMPEG

This is my personal list of functions that I wrote for converting mov files to mp4!

Command Flags

Flag Options Description
-codec:a libfaac, libfdk_aac, libvorbis Audio Codec
-quality best, good, realtime Video Quality
-b:a 128k, 192k, 256k, 320k Audio Bitrate
-codec:v mpeg4, libx264, libvpx-vp9 Video Codec
library(dplyr, warn.conflicts = FALSE)
library(gapminder)
probs <- c(0.1, 0.5, 0.9)
gapminder %>%
group_by(continent) %>%
summarise(
probs = probs,
across(is.numeric & !year, ~ quantile(.x, probs))
)
@kjhealy
kjhealy / epl_goal_contribution_matrix_18-19.r
Created May 19, 2019 19:43 — forked from Ryo-N7/epl_goal_contribution_matrix_18-19.r
Goal contribution matrix for Premier League 2018-2019
# pkgs
pacman::p_load(tidyverse, polite, scales, ggimage, ggforce,
rvest, glue, extrafont, ggrepel, magick)
loadfonts()
## add_logo function from Thomas Mock
add_logo <- function(plot_path, logo_path, logo_position, logo_scale = 10){
# Requires magick R Package https://github.com/ropensci/magick
@kjhealy
kjhealy / join-animations-with-gganimate.R
Created August 15, 2018 12:18 — forked from gadenbuie/join-animations-with-gganimate.R
Animated dplyr joins with gganimate
# Animated dplyr joins with gganimate
# * Garrick Aden-Buie
# * garrickadenbuie.com
# * MIT License: https://opensource.org/licenses/MIT
# Note: I used Fira Sans and Fira Mono fonts.
# Use search and replace to use a different font if Fira is not available.
library(tidyverse)
library(gganimate)
library(XML)
library(ggplot2)
df <- readHTMLTable("http://projects.dailycal.org/paychecker")[[1]]
colnames(df)[4] <- "Salary"
df$Salary <- as.numeric(gsub('[$,]', '', df$Salary))
p <- ggplot(df, aes(x=Department, y=Salary)) + coord_flip()
p + geom_boxplot(aes(color=Rank,
x=reorder(Department, Salary, FUN=max))) +
library(XML)
library(ggplot2)
df <- readHTMLTable("http://projects.dailycal.org/paychecker/departments/")[[1]]
DeMoney <- function(x) as.numeric(gsub(",", "", gsub("\\$", "", as.character(x))))
money.columns <- c("All", "Professor", "Associate professor", "Assistant professor",
"Lecturer")