Skip to content

Instantly share code, notes, and snippets.

@viniciusmss
Created September 29, 2018 15:12
Show Gist options
  • Save viniciusmss/bb177154110d79feb38bd3cf2cd75660 to your computer and use it in GitHub Desktop.
Save viniciusmss/bb177154110d79feb38bd3cf2cd75660 to your computer and use it in GitHub Desktop.
# Characteristics of the data set to be generated:
#
# N(mu, sigma) error term,
# 500 treated units, 500 control units,
# Predictors: treatment (binary variable), and
# hectares (continuous variable--use a uniform distribution),
# pre-treatment productivity (continuous variable--use a uniform distribution).
# Outcome variable: post-treatment productivity: (continuous variable).
# Avg pre-treatment productivity for tmt group
# should be double the average pre-treatment
# productivity for the control group. Average
# hectares should actually be a little higher
# for the tmt group than for the control group.
# Data generating process for Y should be simply
# the pre-treatment productivity + error term.
set.seed(9876)
treat <- factor(c(rep(1, 500), rep(0, 500))) # 500 treat, 500 control
hectares <- c(runif(500, 50, 150), runif(500, 35, 135)) # Average hectares is higher for tmt
mean(hectares[1:500]) - mean(hectares[501:1000]) # 16.3 mean difference
pre_productivity <-
c(runif(500, 1000, 2000), runif(500, 500, 1000))
mean(pre_productivity[1:500])/mean(pre_productivity[501:1000]) # 1.98 mean ratio
post_productivity <- pre_productivity + rnorm(1000, 0, 100) # err = N(0, 100)
df <- data.frame(post_productivity, pre_productivity, hectares, treat)
str(df)
library(ggplot2)
ggplot(data = df, aes(x = `treat`, y = `post_productivity`)) +
geom_boxplot(alpha = 0.3, fill = c("blue", "red")) +
ggtitle("Post-Treatment Productivity Vs.Treatment Status") +
xlab("Treatment Status") +
ylab("Post-Treatment Productivity")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment