emailto: [email protected] student: Francis Smart [email protected] EDMS769G
# Text Network graph
# Student Ability _______________________
# / | \_____
# / | Public/Private Transport \
# ↓ ↓ ↓ ↘
# Pretest score ------→ Teacher ability ---------> Post test
# ↑ ↑ ↗
# \ | /
# \___________Student SES -____________/
This is a simulation of what factors might contribute to student performance on teacher pretest and posttest. We would like to estimate a Value-Added measure of teacher ability so to know who to retain and who to give additional training to.
In principal this is a question
samplesize = 1000
studentability = randn(samplesize)
studentses = randn(samplesize)
# Unobserved Pretest Factors
pretestunobserved = randn(samplesize)
# Pretest is a function of student ability, ses factors, and unobserved
pretest = 0.75 * studentability + 0.25 * studentses + .5 * pretestunobserved
Pretest scores, student ability, student SES, as well as public/private access transportation contribute to teacher ability selection as well as some random unobserved factors.
transportation = randn(samplesize)
# Now let's generate some teacher abilities teacher ability is sorted from low to high
teacherabilityunobserved = randn(samplesize)
# Most factors are positively weighted with students who come from higher SES or have better transportation being associated with higher performing teachers. However, in this simulation students who are worse off in ability are matched with higher ability performing teachers reflecting a approach by the school to intervene is poor student performance.
teacherability = teacherabilityunobserved -.5*studentability + .5*studentses -
.4*pretest + .2*transportation + teacherabilityunobserved
using Statistics
cor(studentability, teacherability)
# -0.3833970805817827
cor(pretest, teacherability)
# -0.3115985976394242
cor(studentses, teacherability)
# 0.15521043067699103
From the correlation statistics we can see that the simulation is working as expected with high performing students getting worse teachers.
The next step is generating performance scores for the end of year exam.
postexamerror = randn(samplesize)
posttext = .75*studentability + 0.25 * studentses + .5 * teacherability + .5*postexamerror
Finally we would like to estimate the Value Added from the teachers. One method of doing this would be to take the postexam
and subtract out the preexam
.
Δscores = posttext - pretestunobserved
We might want to ask some basic questions like. Are Δscores
correlated with teacherability
?
cor(Δscores, teacherability)
# 0.5713281702441514
In this case they do appear to be positively correlated, which is what we would like to see.
If we were to rank teachers and rank change in scores how well would be do?
using StatsBase
corspearman(Δscores, teacherability)
# 0.5516197316197317
We retain about the same correlation. This should not surprise us as we have no ceiling or floor effects built into our data generating process.
We might want to ask questions of our data. What is the likelihood that a teacher classified in the top 10% of our data actually is in the top 10%.
teacherrank = denserank(teacherability)/samplesize
Δscorerank = denserank(Δscores)/samplesize
teacherrank_bottom10 = teacherrank .<= .1
Δscorerank_bottom10 = Δscorerank .<= .1
Given that a teacher is actually ranked in the bottom 10% what is the likelihood that the change score will correctly classify that teacher?
mean(Δscorerank_bottom10[teacherrank_bottom10 .== 1])
# 0.36
36% is not so good.
We can ask the reverse.
Given that a teacher is scored in the bottom 10% by change score what is the likelihood that the that teacher?
mean(teacherrank_bottom10[Δrank_bottom10 .== 1])
# 0.36
Overall, we can have some confidence that our Value Added Measure seems to have some correlation with teacher ability. However, using it as a tool to determine which teachers are poor performers or conversely high performers is likely to lead to high errors in misclassification.