Skip to content

Instantly share code, notes, and snippets.

View derekpowell's full-sized avatar

Derek Powell derekpowell

View GitHub Profile
@derekpowell
derekpowell / DALEXtra_helpers.R
Last active February 27, 2024 18:17
DALEXtra helpers
make_explainer_obj <- function(fitted_workflow){
fitted_model <-
fitted_workflow %>%
extract_fit_parsnip() # <- parsnip model_fit object
feature_data <-
fitted_workflow %>%
extract_mold() %>%
pluck("predictors")
@derekpowell
derekpowell / ghcn-weather-data-sqlite.py
Last active May 5, 2021 21:43
Create sqlite database of NOAA GHCN daily weather data
# script to create sqlite database of daily GHCN weather station data
# adding a table of US zipcodes for my own purposes (easily commented out)
# cribs heavily from https://github.com/dylburger/noaa-ghcn-weather-data
# Settings
import sqlalchemy
import os
import urllib.request
import pandas as pd
@derekpowell
derekpowell / cumulative-intercept-prior.R
Last active May 21, 2019 22:08
Create priors on intercepts for cumulative brms ordinal regression model
cumulative_intercept_prior <- function(k, sd = 2, alpha = 1, beta = 1,
shape = c("flat", "middle", "rightskewed", "leftskewed")) {
## Creates priors on intercepts for cumulative() family regression.
## Assumes that probability of response options follow cumulative beta
## distribution specified by a and b or by "shape" argument.
##
## k = number of categories
## sd = std dev of normal over intercept
## a, b = alpha and beta specifying shape of distribution (defaults to uniform)
## shape = string specifying pre-defined distribution shape
@derekpowell
derekpowell / pymc3-horseshoe-prior.py
Last active March 7, 2022 11:56
pymc3 horseshoe prior implementation
def horseshoe_prior(name, X, y, m, v, s):
'''
Regularizing horseshoe prior as introduced by Piironen & Vehtari
https://arxiv.org/pdf/1707.01694.pdf
name: variable name
X: X (2-d array)
y: y (for setting pseudo-variance)
m: expected number of relevant features (must be < total N)
v: regularizing student-t df
@derekpowell
derekpowell / spread_gather.py
Created January 25, 2019 22:57
python pandas implementations of spread and gather dplyr verbs
def gather( df, key, value, cols ):
id_vars = [ col for col in df.columns if col not in cols ]
id_values = cols
var_name = key
value_name = value
return pd.melt( df, id_vars, id_values, var_name, value_name )
def spread( df, index, columns, values ):
return df.pivot(index, columns, values).reset_index(level=index).rename_axis(None,axis=1)
@derekpowell
derekpowell / cbrm.R
Last active May 22, 2023 20:36
Wrapper for brm() that supports caching of BRMS models
cbrm <- function(formula,
data,
family = gaussian(),
prior = NULL,
autocor = NULL,
cov_ranef = NULL,
sample_prior = c("no", "yes", "only"),
sparse = FALSE,
knots = NULL,
stan_funs = NULL,
@derekpowell
derekpowell / rescale_beta.R
Last active March 24, 2018 06:45
rescale variable on open interval (0, 1)
rescale_beta <- function(x, lower, upper) {
# rescales onto the open interval (0,1)
# rescales over theoretical bounds of measurement, specified by "upper" and "lower"
N <- length(x)
res <- (x - lower) / (upper - lower)
res <- (res * (N - 1) + .5) / N
return(as.vector(res))
}
@derekpowell
derekpowell / read_qualtrics_csv.R
Last active October 19, 2021 20:31
read qualtrics csv
read_qualtrics_csv <- function(fname) {
headers <- as.matrix(read.csv(fname, skip = 0, header = F, nrows = 1, as.is = T))
df <- read_csv(fname, skip = 3, col_names = headers)
df <- df %>%
filter(DistributionChannel=="anonymous")
return(df)
}
@derekpowell
derekpowell / brms_model.R
Created March 24, 2018 06:40
boilerplate to create brms model
model_name <- brm(
DV ~ formula,
data = d,
family = normal(), # student(), #cumulative(), #bernoulli(), etc
control = list(adapt_delta = .80),
cores = parallel::detectCores(),
iter = 2000)
@derekpowell
derekpowell / redact.R
Last active August 30, 2018 23:50
Redact qualtrics workerId and IPAddress from data
# Author: Derek Powell
# Date: 8/30/18, 4:49 PM
# ---
# Script to redact workerIds and ip addresses from qualtrics files.
# Script looks for "date_private/" directory, saves resulting data in "data" directory.
# Personal info is replaced with a "hash" using xxhash64,
# a super fast hash algo w/ short resulting hashes (confirmed appropriate for this use)
suppressMessages(library(tidyverse))
suppressMessages(library(digest))