Students should extract subsets of data from particular groups within their data set for this exercise. In the absense of that, you can run the create_sample.py
script to create the sample and then run the compare_sample_means.py
script to compare the two groups.
Last active
August 29, 2015 14:03
-
-
Save deanmalmgren/c841585904ee74ec8cc2 to your computer and use it in GitHub Desktop.
bootstrapping: comparing sample means
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sample_a.dat | |
sample_b.dat |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
"""Given two files that contain a list of data values, compute the | |
differences in the sample means with a bootstrapping approach | |
""" | |
import random | |
def read_data(filename): | |
with open(filename) as stream: | |
return map(float, stream.read().split()) | |
def calculate_sample_mean(n, data): | |
sample = [random.choice(data) for i in xrange(n)] | |
return sum(sample) / n | |
m = 1000 | |
n = 100 | |
a_data = read_data('sample_a.dat') | |
b_data = read_data('sample_b.dat') | |
# calculate a bunch of sample means | |
count = 0 | |
for j in xrange(m): | |
a_mean = calculate_sample_mean(n, a_data) | |
b_mean = calculate_sample_mean(n, b_data) | |
if a_mean > b_mean: | |
count += 1 | |
print("a_mean > b_mean %(count)d out of %(m)d times" % locals()) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
"""I don't currently have access to the movie or horse racing data, so | |
use this script to create the group a and group b data sets that are | |
used in subsequent steps. | |
""" | |
import sys | |
import random | |
def write_sample(stream, n, lam): | |
filename = stream.name | |
sys.stderr.write(( | |
'%(filename)s with %(n)d values from exponential distribution' | |
'with lam=%(lam)s\n' | |
) % locals()) | |
for i in xrange(n): | |
x = random.expovariate(lam) | |
stream.write(str(x) + '\n') | |
with open('sample_a.dat', 'w') as stream: | |
write_sample(stream, 100000, 1.5) | |
with open('sample_b.dat', 'w') as stream: | |
write_sample(stream, 20000, 1.1) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment