diamonaj · December 12, 2022 02:48
diff --git a/Assignment3.Rmd b/Assignment3.Rmd
 ---
 title: "Assignment 3"
 output: pdf_document
 date: '2022-11-30'
 ---

 ```{r setup, include=FALSE}
 knitr::opts_chunk$set(echo = TRUE)
 set.seed(1)
 ```

 ###B1

 ```{r}
 library(foreign)
 x <- read.dta("basic.dta")

 # These variables already exist. Making them factors just makes it easier for R to handle them
 x$servicelength <- x$srvlng # name change for readability
 # Age already exists, so does demvote
 x$republican <- x$party == 2
 x$christian <- x$rgroup == 1 | x$rgroup == 2 | x$rgroup == 3 # 1,2, and 3 are the christian groups

 # Drop nas, but only for relevant columns (nowtot and anygirls). 
 #We are dropping a lot of observation but there's not much of a way around it.
 all_vars <- data.frame(x$nowtot, x$anygirls, x$female, x$white, x$servicelength, x$republican, x$christian, x$age, x$demvote)

 # identify the rows with complete data for those columns (TRUE indicates NO MISSING DATA)
 rows_to_retain <- complete.cases(all_vars)

 x <- x[rows_to_retain,]

 covars <- data.frame(x$female, x$white, x$servicelength, x$republican, x$christian, x$age, x$demvote)
 names(covars) <- c("gender", "white", "servicelength", "republican", "christian", "age", "demvote")
 ```
 ###B2


 ```{r}
 lm1 <- lm(data=x, formula = nowtot ~ anygirls + female + white + servicelength + republican + christian + age + demvote)
 summary(lm1)
 ```

 According to this regression specification, and conditional on the usual assumptions of causal inference, having any amount of girls is associated with a -0.66 change in the NOW vote score: (associated with less liberal legislators). 
 However, the p value is high, so we cannot confidently conclude that there is an effect at all.

 We can also see that this regression explained most of the variance, with an $R^2$ of 0.7962. Other variables that were found to be significant are age, gender, republican, christian religion and the democratic vote share in the representative's country.

 ###B3

 ```{r}
 #install.packages("sensemakr")
 library(sensemakr)
 lm1.sensitivity <- sensemakr(model = lm1, 
                                treatment = "anygirls",
                                benchmark_covariates = "female",
                                kd = 1:3,
                                ky = 1:3, 
                                q = 1,
                                alpha = 0.05, 
                                reduce = FALSE) # Since our treatment effect is negative
 summary(lm1.sensitivity)
 ```
 Contour plot of the treatment effect for different covariate strengths

 ```{r}
 plot(lm1.sensitivity)
 ```

 Contour plot of the t-value for different covariate strengths
 ```{r}
 plot(lm1.sensitivity, sensitivity.of='t-value')
 ```

 Effect of a confounder that is highly correlated to the outcome on the treatment effect, by correlation to treatment
 ```{r}
 plot(lm1.sensitivity, type='extreme')
 ```

 Interpretation: As our effect is quite small it is no surprise that a confounder that explains 1.75% of both treatment and outcome residual variance is enough to double our causal effect. If that confounder explains all of the treatment, it only needs to explain 0.03% the residual variance of the treatment. From the graphs we can also see that it's easy for a confounder to change our result dramatically. We should keep in mind that while the confounders can make it so we see a causal effect when there isn't one, or when the sign is wrong, they might also make it such that we are not seeing a true causal effect. Generally though, the results here show that the model is very sensitive to bias.


 ###B4

 ```{r}
 library(Matching)

 # Vector indicating which variables to do exact matching on. 
 # Only gender is required and it is the first variable
 exact <- c(1,0,0,0,0,0,0)  

 genout <- GenMatch(Tr=x$anygirls, X=covars, exact=exact, pop.size=500, replace=TRUE)

 mout <- Match(X=covars, Tr=x$anygirls, 
               Weight.matrix=genout, exact  = exact)
 MatchBalance(anygirls ~ female + white + servicelength + republican + christian + age + demvote, 
             data = x, match.out = mout, nboots=500)
 ```

 The balance that is obtained is excellent, with a minimum p value of 0.54531, The worst-matched variable is demvote. Interestingly, the p-value of christian fell after genetic matching (even though it is still very high after matching). Sometimes genetic matching will sacrifice some degree of really good balance in some variables to improve really bad balance in other variables.

 ###B5

 ```{r}
 mout <- Match(Y=x$nowtot, X=covars, Tr=x$anygirls, 
               Weight.matrix=genout)
 summary(mout)
 ```

 With matching alone, we get an estimated causal effect of 0 (very unusually and coincidentally) and a p-value of 1. This is because the mean in the treatment group and the mean in the control group is exactly the same. We matched with replacement--we matched 312 treated units to a smaller number of control units.


 ###B6

 ```{r}
 matching_weights <- c(mout$weights, mout$weights)
 matched_dataset <- rbind(x[c(mout$index.treated, mout$index.control),])

 lm2 <- lm(nowtot ~ anygirls + female + white + servicelength + republican + christian + age + demvote, data=matched_dataset, weights=matching_weights)
 summary(lm2)
 confint(lm2)
 ```

 When combining genetic matching and a linear regression, we get that the effect of having daugthers is -0.07 with a super-high p-value of 0.95. The 95% CI of the treatment variable is [-2.36, 2.51], which suggests it could be just about anywhere in that range. The regression result is very similar to the result we obtained after matching (which was a treatment effect point estimate of 0).

 ###B7
 ```{r}
 lm2.sensitivity <- sensemakr(model = lm2, 
                                treatment = "anygirls",
                                benchmark_covariates = "female",
                                kd = 1:3,
                                ky = 1:3, 
                                q = 1,
                                alpha = 0.05, 
                                reduce = FALSE) # Since our treatment effect is negative
 summary(lm1.sensitivity)
 ```
 Contour plot of the treatment effect for different covariate strengths

 ```{r}
 plot(lm2.sensitivity)
 ```

 Contour plot of the t-value for different covariate strengths
 ```{r}
 plot(lm2.sensitivity, sensitivity.of='t-value')
 ```

 Effect of a confounder that is highly correlated to the outcome on the treatment effect, by correlation to treatment
 ```{r}
 plot(lm2.sensitivity, type='extreme')
 ```

 The sensemakr outputs suggests that results are super-sensitive to any unobserved confounders, even when the imagined confounders are hardly at all correlated with treatment assignment and/or outcomes. This is, of course, really different from what was published in the original paper by Washington, and what was discussed in the replication by King et al.
	---
	title: "Assignment 3"
	output: pdf_document
	date: '2022-11-30'
	---

	```{r setup, include=FALSE}
	knitr::opts_chunk$set(echo = TRUE)
	set.seed(1)
	```

	###B1

	```{r}
	library(foreign)
	x <- read.dta("basic.dta")

	# These variables already exist. Making them factors just makes it easier for R to handle them
	x$servicelength <- x$srvlng # name change for readability
	# Age already exists, so does demvote
	x$republican <- x$party == 2
	x$christian <- x$rgroup == 1 \| x$rgroup == 2 \| x$rgroup == 3 # 1,2, and 3 are the christian groups

	# Drop nas, but only for relevant columns (nowtot and anygirls).
	#We are dropping a lot of observation but there's not much of a way around it.
	all_vars <- data.frame(x$nowtot, x$anygirls, x$female, x$white, x$servicelength, x$republican, x$christian, x$age, x$demvote)

	# identify the rows with complete data for those columns (TRUE indicates NO MISSING DATA)
	rows_to_retain <- complete.cases(all_vars)

	x <- x[rows_to_retain,]

	covars <- data.frame(x$female, x$white, x$servicelength, x$republican, x$christian, x$age, x$demvote)
	names(covars) <- c("gender", "white", "servicelength", "republican", "christian", "age", "demvote")
	```
	###B2


	```{r}
	lm1 <- lm(data=x, formula = nowtot ~ anygirls + female + white + servicelength + republican + christian + age + demvote)
	summary(lm1)
	```

	According to this regression specification, and conditional on the usual assumptions of causal inference, having any amount of girls is associated with a -0.66 change in the NOW vote score: (associated with less liberal legislators).
	However, the p value is high, so we cannot confidently conclude that there is an effect at all.

	We can also see that this regression explained most of the variance, with an $R^2$ of 0.7962. Other variables that were found to be significant are age, gender, republican, christian religion and the democratic vote share in the representative's country.

	###B3

	```{r}
	#install.packages("sensemakr")
	library(sensemakr)
	lm1.sensitivity <- sensemakr(model = lm1,
	treatment = "anygirls",
	benchmark_covariates = "female",
	kd = 1:3,
	ky = 1:3,
	q = 1,
	alpha = 0.05,
	reduce = FALSE) # Since our treatment effect is negative
	summary(lm1.sensitivity)
	```
	Contour plot of the treatment effect for different covariate strengths

	```{r}
	plot(lm1.sensitivity)
	```

	Contour plot of the t-value for different covariate strengths
	```{r}
	plot(lm1.sensitivity, sensitivity.of='t-value')
	```

	Effect of a confounder that is highly correlated to the outcome on the treatment effect, by correlation to treatment
	```{r}
	plot(lm1.sensitivity, type='extreme')
	```

	Interpretation: As our effect is quite small it is no surprise that a confounder that explains 1.75% of both treatment and outcome residual variance is enough to double our causal effect. If that confounder explains all of the treatment, it only needs to explain 0.03% the residual variance of the treatment. From the graphs we can also see that it's easy for a confounder to change our result dramatically. We should keep in mind that while the confounders can make it so we see a causal effect when there isn't one, or when the sign is wrong, they might also make it such that we are not seeing a true causal effect. Generally though, the results here show that the model is very sensitive to bias.


	###B4

	```{r}
	library(Matching)

	# Vector indicating which variables to do exact matching on.
	# Only gender is required and it is the first variable
	exact <- c(1,0,0,0,0,0,0)

	genout <- GenMatch(Tr=x$anygirls, X=covars, exact=exact, pop.size=500, replace=TRUE)

	mout <- Match(X=covars, Tr=x$anygirls,
	Weight.matrix=genout, exact = exact)
	MatchBalance(anygirls ~ female + white + servicelength + republican + christian + age + demvote,
	data = x, match.out = mout, nboots=500)
	```

	The balance that is obtained is excellent, with a minimum p value of 0.54531, The worst-matched variable is demvote. Interestingly, the p-value of christian fell after genetic matching (even though it is still very high after matching). Sometimes genetic matching will sacrifice some degree of really good balance in some variables to improve really bad balance in other variables.

	###B5

	```{r}
	mout <- Match(Y=x$nowtot, X=covars, Tr=x$anygirls,
	Weight.matrix=genout)
	summary(mout)
	```

	With matching alone, we get an estimated causal effect of 0 (very unusually and coincidentally) and a p-value of 1. This is because the mean in the treatment group and the mean in the control group is exactly the same. We matched with replacement--we matched 312 treated units to a smaller number of control units.


	###B6

	```{r}
	matching_weights <- c(mout$weights, mout$weights)
	matched_dataset <- rbind(x[c(mout$index.treated, mout$index.control),])

	lm2 <- lm(nowtot ~ anygirls + female + white + servicelength + republican + christian + age + demvote, data=matched_dataset, weights=matching_weights)
	summary(lm2)
	confint(lm2)
	```

	When combining genetic matching and a linear regression, we get that the effect of having daugthers is -0.07 with a super-high p-value of 0.95. The 95% CI of the treatment variable is [-2.36, 2.51], which suggests it could be just about anywhere in that range. The regression result is very similar to the result we obtained after matching (which was a treatment effect point estimate of 0).

	###B7
	```{r}
	lm2.sensitivity <- sensemakr(model = lm2,
	treatment = "anygirls",
	benchmark_covariates = "female",
	kd = 1:3,
	ky = 1:3,
	q = 1,
	alpha = 0.05,
	reduce = FALSE) # Since our treatment effect is negative
	summary(lm1.sensitivity)
	```
	Contour plot of the treatment effect for different covariate strengths

	```{r}
	plot(lm2.sensitivity)
	```

	Contour plot of the t-value for different covariate strengths
	```{r}
	plot(lm2.sensitivity, sensitivity.of='t-value')
	```

	Effect of a confounder that is highly correlated to the outcome on the treatment effect, by correlation to treatment
	```{r}
	plot(lm2.sensitivity, type='extreme')
	```

	The sensemakr outputs suggests that results are super-sensitive to any unobserved confounders, even when the imagined confounders are hardly at all correlated with treatment assignment and/or outcomes. This is, of course, really different from what was published in the original paper by Washington, and what was discussed in the replication by King et al.
No results found