- As datasets grow in size, it's going to become trivial to find "significant" effects (i.e. non-zero).
- That isn't a problem that can be fixed by just shrinking α down.
- We need to ask ourselves:
- Are the effects we're observing large enough to be interesting?
- How big did we expect them to be?
- To answer (2), we need an articulated theory that can make quantitative predictions.
- I walk through two examples where I try to predict effect sizes given background theory.
- link: https://jofrhwld.github.io/papers/plc39_2015/
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(tidyverse) | |
replicates <- (1:100000)%>% | |
map(~sample(faithful$waiting, replace = T))%>% | |
map(mean)%>% | |
simplify() | |
data_frame(replicates = replicates)%>% | |
ggplot(aes(replicates))+ | |
stat_density() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(tidyverse) | |
replicates <- (1:100000)%>% | |
map(~sample(faithful$waiting, replace = T))%>% | |
map(mean)%>% | |
simplify() | |
data_frame(replicates = replicates)%>% | |
ggplot(aes(replicates))+ | |
stat_density() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#' I often have individual speaker data files in a nested directory structure. | |
#' But I also often want to read all speaker's data into R in one big data frame. | |
#' Here's my current best recipe. | |
library(tidyverse) | |
#' glob for the file list. This is dependent on good directory naming practices | |
all_files <- Sys.glob("path/speakerid*/*.csv") | |
df <- data_frame(file = all_files) %>% # make a column of all of the file paths |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#' rvest for scraping 538 | |
library(rvest) | |
library(magrittr) | |
#' scrape the forecast | |
five38 <- read_html("http://projects.fivethirtyeight.com/2016-election-forecast/?ex_cid=rrpromo#plus") | |
#' I'd prefer to be using the polls-pluss forecast here, but | |
#' can only seem to get the polls only | |
clinton <- five38 %>% |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#' Find zero crossings in an fd object | |
#' | |
#' @import fda | |
#' @import magrittr | |
#' | |
#' @param fd an fd object | |
#' @param Lfdobj the derivative (0, 1, 2) | |
#' @param slope The slope of interest at the zero crossing | |
#' @param eps The prediction granularity | |
#' @param min Localize the zero crossing search to be greater than min |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
list2fd <- function(list, basis){ | |
if(class(list[[1]]) == "fdSmooth"){ | |
coef_list <- lapply(list, function(x)x$fd$coefs) | |
}else if(class(list[[1]]) == "fd"){ | |
coef_list <- lapply(list, function(x)x$coefs) | |
} | |
n_coefs <- unlist(lapply(coef_list, length)) | |
if(!all(n_coefs == max(n_coefs))) stop() | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(purrr) | |
library(dplyr) | |
library(data.table) | |
meas_files <- Sys.glob("DataDirectory/speakers/*/*.txt") | |
meas_files %>% | |
map(~fread(.)[,list(idstring = gsub("(*).txt", | |
"\\1", | |
basename(.)), |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from nltk.corpus import cmudict | |
import string | |
import re | |
the_dict = cmudict.dict() | |
the_dict2 = {word: [string.join(x, sep = " ") | |
for x in entries] | |
for word, entries in the_dict.items()} | |
two_n = {word: entries |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(babynames) | |
library(dplyr) | |
library(ggplot2) | |
lifetables %>% | |
mutate(decade = year)%>% | |
group_by(decade)%>% | |
mutate(prob_alive = lx/100000, | |
study_year = year + x)->prob_people |