Skip to content

Instantly share code, notes, and snippets.

@diamonaj
diamonaj / 2.2.R
Created September 19, 2018 10:58
# Regression 101
# CLAIM:
# For regression with a single binary predictor,
# the regression coefficient on the predictor is the difference
# between the averages of the two groups.
# TASK ONE:
# Decide if this claim is true/false.
# Work together as a team to perform analysis and/or creates a
storagedf_16 <- matrix(NA, nrow = 100, ncol = 78) # for ages 18:95, thus 78
for(age in c(18:95)) {
for(i in 1:100)
{
beta <- sum(simulations@coef[i,]*(c(1, mean_white, age, 16, mean_income, age^2*0.01, )))
storagedf_16[i, age - 17] <- exp(beta)/(1 + exp(beta)) # for a given age, we iterate thru
# the simulated coefficients. The
# first column represents all the
# expected values for age = 18 years old.
## ONE PERSON FROM YOUR GROUP SHOULD SCREEN-SHARE
######## FROM HERE TO THE NEXT SET OF #######
######## (BELOW) RUN THE CODE ALL AT ONCE (NOT LINE BY LINE)
install.packages("dplyr")
library(dplyr)
install.packages("tree")
library(tree)
@diamonaj
diamonaj / gist:cee915abc59a5b8cc9a64fb2ed50d0f4
Last active October 19, 2018 12:27
Correlated vs. Uncorrelated
# EXERCISE TO BUILD INTUITION FOR CORRELATED VS. UNCORRELATED DATA
# PLEASE FOCUS ON UNDERSTANDING THE BELOW
### DO NOT JUST EXECUTE ALL THE CODE IN ONE BATCH--RUN IT LINE BY LINE...
### Simulation of analysis on correlated data
set.seed(1314)
nsims <- 10000
################ PRELIMINARIES
library(MASS)
data(Pima.tr)
library(tree)
library(randomForest)
## STEP 1: Logistic regression ##
logistic_reg <- glm(type ~ ., data = Pima.tr, family = binomial) # basic model
predict_logistic.tr <- predict(logistic_reg, type = "response") # predicted probabilities (TRAINING SET)
storage.vector <- NA
# Function that assigns treatment/control depending on
# propensity scores (assignment probabilities)
experiment <- function(vector.of.probabilities = NULL) {
k = 0
for (i in 1:length(vector.of.probabilities)) {
if(
sample(x = c(1,0), size = 1, prob = c(vector.of.probabilities[i],
1 - vector.of.probabilities[i])) == 1) {
PEACEKEEPING WORKOUT (based on King, Gary;Zeng, Langche, 2007,
"Replication data for: When Can History be Our Guide?
The Pitfalls of Counterfactual Inference",
https://hdl.handle.net/1902.1/DXRXCFAWPK,
Harvard Dataverse, V4,
UNF:3:DaYlT6QSX9r0D50ye+tXpA== [fileUNF] )
# CONSIDER USING THE JUPYTER NOTEBOOK WITH R-SERVER KERNEL (NEVER R-SAGE KERNEL)
foo <- read.csv("https://course-resources.minerva.kgi.edu/uploaded_files/mke/00086677-3767/peace.csv")
# extract relevant columns
*****INSTRUCTIONS*****
This assignment requires the peacekeeping data set that we worked on in class, as well as this codebook:
http://www.nyu.edu/gsas/dept/politics/faculty/cohen/codebook.pdf.
The class breakout instructions (including data download code) are here:
https://gist.github.com/diamonaj/3795bfc2e6349d00aa0ccfe14102858d
(1) Replicate figure 8 in https://gking.harvard.edu/files/counterf.pdf.
Spring 2019
*****INSTRUCTIONS*****
(1) Debugging--in the 3 cases below (a through c), identify the major coding error in each case and explain how to fix it, in 1-2
sentences. DO NOT actually copy/paste corrected code:
(a) https://gist.github.com/diamonaj/2e5d5ba5226b7b9760f5d1bf1e7bf765
(b) https://gist.github.com/diamonaj/3b6bc83d040098486634184d99fc4c55
@diamonaj
diamonaj / 1a.R
Last active March 19, 2019 03:52
genout <- GenMatch(Tr=treat, X=X)
summary(mout)
mb <- MatchBalance(treat~age +educ+black+ hisp+ married+ nodegr+ u74+ u75+
re75+ re74+ I(re74*re75) + re78,
match.out=genout, nboots=500)