Skip to content

Instantly share code, notes, and snippets.

@JoFrhwld
JoFrhwld / purrr_bootstrap.R
Created July 29, 2017 20:06
Using purrr to do bootstrap estimation of the mean
library(tidyverse)
replicates <- (1:100000)%>%
map(~sample(faithful$waiting, replace = T))%>%
map(mean)%>%
simplify()
data_frame(replicates = replicates)%>%
ggplot(aes(replicates))+
stat_density()
@JoFrhwld
JoFrhwld / purrr_bootstrap.R
Created July 29, 2017 20:06
Using purrr to do bootstrap estimation of the mean
library(tidyverse)
replicates <- (1:100000)%>%
map(~sample(faithful$waiting, replace = T))%>%
map(mean)%>%
simplify()
data_frame(replicates = replicates)%>%
ggplot(aes(replicates))+
stat_density()
#' I often have individual speaker data files in a nested directory structure.
#' But I also often want to read all speaker's data into R in one big data frame.
#' Here's my current best recipe.
library(tidyverse)
#' glob for the file list. This is dependent on good directory naming practices
all_files <- Sys.glob("path/speakerid*/*.csv")
df <- data_frame(file = all_files) %>% # make a column of all of the file paths
@JoFrhwld
JoFrhwld / terror.R
Last active August 14, 2016 11:13
The most terrifying gist ever created.
#' rvest for scraping 538
library(rvest)
library(magrittr)
#' scrape the forecast
five38 <- read_html("http://projects.fivethirtyeight.com/2016-election-forecast/?ex_cid=rrpromo#plus")
#' I'd prefer to be using the polls-pluss forecast here, but
#' can only seem to get the polls only
clinton <- five38 %>%
#' Find zero crossings in an fd object
#'
#' @import fda
#' @import magrittr
#'
#' @param fd an fd object
#' @param Lfdobj the derivative (0, 1, 2)
#' @param slope The slope of interest at the zero crossing
#' @param eps The prediction granularity
#' @param min Localize the zero crossing search to be greater than min
list2fd <- function(list, basis){
if(class(list[[1]]) == "fdSmooth"){
coef_list <- lapply(list, function(x)x$fd$coefs)
}else if(class(list[[1]]) == "fd"){
coef_list <- lapply(list, function(x)x$coefs)
}
n_coefs <- unlist(lapply(coef_list, length))
if(!all(n_coefs == max(n_coefs))) stop()
library(purrr)
library(dplyr)
library(data.table)
meas_files <- Sys.glob("DataDirectory/speakers/*/*.txt")
meas_files %>%
map(~fread(.)[,list(idstring = gsub("(*).txt",
"\\1",
basename(.)),
@JoFrhwld
JoFrhwld / talk_gist.md
Last active August 29, 2015 14:17
Big Data and Sociolinguistics
  • As datasets grow in size, it's going to become trivial to find "significant" effects (i.e. non-zero).
    • That isn't a problem that can be fixed by just shrinking α down.
  • We need to ask ourselves:
    1. Are the effects we're observing large enough to be interesting?
    2. How big did we expect them to be?
  • To answer (2), we need an articulated theory that can make quantitative predictions.
  • I walk through two examples where I try to predict effect sizes given background theory.
  • link: https://jofrhwld.github.io/papers/plc39_2015/
@JoFrhwld
JoFrhwld / cmu_n.py
Last active August 29, 2015 14:15
non initial onset /n/
from nltk.corpus import cmudict
import string
import re
the_dict = cmudict.dict()
the_dict2 = {word: [string.join(x, sep = " ")
for x in entries]
for word, entries in the_dict.items()}
two_n = {word: entries
library(babynames)
library(dplyr)
library(ggplot2)
lifetables %>%
mutate(decade = year)%>%
group_by(decade)%>%
mutate(prob_alive = lx/100000,
study_year = year + x)->prob_people