Skip to content

Instantly share code, notes, and snippets.

View dantonnoriega's full-sized avatar

Danton Noriega-Goodwin dantonnoriega

View GitHub Profile
@dantonnoriega
dantonnoriega / dark_widgetframe.R
Created January 9, 2019 04:27
my attempt to create a widgetframe with a dark background. hackish. seems to shift xaringan presentation oddly.
# remove any old tmp files
old_tmp_files <- list.files(pattern = '^tmp', include.dirs = TRUE, full.names = TRUE)
invisible(unlink(old_tmp_files, recursive = TRUE))
# create a dark widget frame via work around
dark_widgetframe <-
function(widget, background = '#666666FF', width = '100%', height = 420) {
file = tempfile(pattern = "tmp", tmpdir = '.', fileext = ".html")
selfcontained = FALSE
libdir = NULL
@dantonnoriega
dantonnoriega / dirichletreg-greta-imultilogit-ex.R
Last active January 16, 2019 23:07
translate the stan code from https://arxiv.org/pdf/1808.06399.pdf into greta code (https://greta-dev.github.io/greta/index.html) but use new imultilogit() function over simplex_mat() --- about 15% speed increase.
# recreate the code from https://arxiv.org/pdf/1808.06399.pdf using greta (https://greta-dev.github.io/greta/)
library("DirichletReg")
Bld <- BloodSamples
Bld <- na.omit(Bld)
Bld$Smp <- DR_data(Bld[, 1:4])
# using greta
# !! requires the development version to run!
# devtools::install_github("greta-dev/greta@dev")
# convert data to matrix then greta data
@dantonnoriega
dantonnoriega / remote-ssh-plus-local-cluster-future-example.R
Last active January 26, 2021 17:54
Example setting up a mixed remote / local cluster using the `future` package. Includes simple examples of how to properly execute plan to maximize cores. Note that this does not show how important it is to have the same R version AND package versions across all nodes.
# set up ---------------------------
library(furrr)
# use multiple clusters
ssh_username <- 'drn'
remote_ssh_configs <- c('a', 'b', 'e') # names for remote server (found in ~/.ssh/config e.g. Host a)
local_comp <- Sys.info()[["nodename"]] # get local computer name
# build cluster ----------------------------------------------------------------
system(command = "ps -axc | grep ssh | awk '{print $1}' | sort -u | xargs kill")
# recreate the code from https://arxiv.org/pdf/1808.06399.pdf using greta (https://greta-dev.github.io/greta/)
library("DirichletReg")
Bld <- BloodSamples
Bld <- na.omit(Bld)
Bld$Smp <- DR_data(Bld[, 1:4])
# simplex function. applies simplex to each row of a matrix.
# HUGE speed gains using matrix mat vs for loop!
simplex_mat <- function(x){
exp_x <- exp(x)
# stan code from https://arxiv.org/pdf/1808.06399.pdf
library("DirichletReg")
Bld <- BloodSamples
Bld <- na.omit(Bld)
Bld$Smp <- DR_data(Bld[, 1:4])
stan_code <- '
data {
int<lower=1> N; // total number of observations
int<lower=2> ncolY; // number of categories
@dantonnoriega
dantonnoriega / deep_loops_in_stan.R
Last active August 16, 2018 19:03 — forked from khakieconomics/deep_loops_in_stan.R
Write your deep loops in Stan, not R. added a quick intro into how the simulations fill into the matrix
library(tidyverse)
library(rstan)
## HOW THE SIMULATION LOOP BELOW WORKS -- ignore the shocks for now --------------
# quick example
I = 3 # individuals
M = 5 # "months"
S = 7 # sims
mat <- matrix(NA, I*M, S) # M rows (months) will be filled in at a time for each individual i across all S columns (simulations); records for individual i will populate rows ((i - 1)*M + 1):(i*M)
@dantonnoriega
dantonnoriega / tweedie-simulations-with-dispersion-estimate.R
Created April 16, 2018 07:25
I wanted to understand how to simulate counts from a tweedie distribution using fitted mu after using gam but didn't get how to estimate the dispersion parameter, phi. had to dig through code (stats::summary.glm) and through some papers to verify. looks good!
# inspired by https://stats.stackexchange.com/questions/174121/can-a-model-for-non-negative-data-with-clumping-at-zeros-tweedie-glm-zero-infl
# additions by Danton Noriega
library(statmod)
library(tweedie)
library(mgcv)
# generate fake mu (poisson count rates)
set.seed(1789)
x <- seq(1,100, by = .1)
mutrue <- exp(-1+x/25)
# a running list of really useful data.table tricks
# TOP 1,2 ... last row by some group id
## source: https://stackoverflow.com/questions/16325641/is-it-possible-to-extract-the-first-2-rows-for-each-date#comment23381259_16325932
## comment by @eddi
id <- c('date', 'userid')
dt[dt[, .I[1:2], by = id]$V1] # first 2 rows by id
dt[dt[, .I[.N], by = id]$V1] # last row by id (.N = length of group, .I = row index)
@dantonnoriega
dantonnoriega / major_holidays_2000_2025.csv
Last active January 26, 2018 01:03
list of some major holidays from 2000 - 2025. not expansive but hypothesis is that these dates correlate with high technology use.
year date holiday
2000 2000-01-01 New Year's Day
2000 2000-02-05 Chinese New Year
2000 2000-02-14 Valentine's Day
2000 2000-04-23 Easter Sunday
2000 2000-05-14 Mother's Day
2000 2000-06-18 Father's Day
2000 2000-07-04 Independence Day
2000 2000-10-31 Halloween
2000 2000-11-23 Thanksgiving Day
@dantonnoriega
dantonnoriega / sublime-section-expands.py
Last active December 21, 2017 21:31
sublime text script to create two new expand commands: to end of file and expand to section. a section is defined by a header using the Rstudio syntax '# SECTION TITLE ----------' or anything with 4 hashes '####'. flexible with comment sections as well.
import sublime, sublime_plugin
class ExpandSelectionToEofCommand(sublime_plugin.TextCommand):
def run(self, edit):
v = self.view
s = v.sel()
eof = v.size()
first_point = v.line(s[0]).a
region_to_eof = sublime.Region(first_point, eof)