Skip to content

Instantly share code, notes, and snippets.

@crystalattice
Created August 29, 2017 00:10
Show Gist options
  • Select an option

  • Save crystalattice/8948d4aca9cfcf62e138a5e58dd34258 to your computer and use it in GitHub Desktop.

Select an option

Save crystalattice/8948d4aca9cfcf62e138a5e58dd34258 to your computer and use it in GitHub Desktop.
Experiment challenge
1. There are a total of 100 participants; one group saw 40% effectiveness and the other saw 10%.
a. These were tested on droids, so who knows what their programming dictates.
b. The test subjects were captives, so motivation may play a part.
c. The test giver was different for each group, leading to observer bias of the results. The Emperor could be more persuasive because of his position.
d. What was the makeup of the droid population? If they were the same model, then individual biases should be null. However, different models may reflect different tendencies.
e. Were the sample groups homogenous or heterogeneous? (see previous bullet)
f. How were the slogans presented to the sample groups? Different deliveries may yield different results.
g. The best solution is to have one person, ideally a disinterested third-party, present the slogans to both groups, using the same presentation style. As much as possible, the droids should be split up into equally-represented groups.
2. Same number of sample groups; one was 75% effective while the other was 65% effective.
a. Favorable ratings should be expected to already be high. What was the level of “favorable feelings” prior to the meetings?
b. If Mace Window went to “hostile” planets but still saw a jump in favorable feelings, the actual difference between previous and new ratings would probably be significantly higher than for Jar Jar.
c. The final analysis should be based on the deltas between previous feelings and current feelings, not just on current attitudes measured after the visits.
d. The delivery of the presentations should be accounted for as well. Each person may deliver the material differently, so having them change places for another round of visits would help balance out the ratings
3. No information on populations or sample sizes.
a. Without knowing the number of employees between HR and IT, comparing the two satisfaction ratings is pretty much useless. A larger population can hide data of interest within the larger noise, whereas a smaller population can show a higher point factor due to more impact per individual score on the overall ratings. For example, a 0 on one test and 100% on a second yields an average score of 50%; a 0 on one test and 100% on nine tests yields an average of 90%.
b. The fact that IT workers are equally distributed between all locations and HR is not will skew the results. This can be mitigated somewhat through data normalization.
c. The two groups’ ratings should be calculated separately and indicated as such. If they want to be compared, then the values should be normalized to a specific scale that accounts for their individual variance so the values can be appropriately compared.
4. No information on populations or sample sizes.
a. Opt-in data is inherently biased, as it only represents those people are interested in participating. In this scenario, people who are more likely to workout will opt-in, as they are more likely to be the type of people who use fitness trackers.
b. There is no accounting of “gaming the system”, e.g. vigorously moving the tracking device to simulate exercise.
c. The long-term habits of people need to be accounted, if the desire is to truly track the fitness levels of people. Once the novelty runs out, most people resume their previous habits.
d. If possible, a control group of average people should be created to see how their data compares to the regular user. Screening of potential candidates needs to be performed to ensure their level of fitness is truly average, i.e. canvassing people at a gym may not be the best, since a large portion of people don’t work out, or not at serious levels.
e. Comparatively, looking at the app’s usage by fitness freaks may yield interesting results compared to the normal app user.
5. No information on population or sample sizes.
a. There was no randomization of tests, so it is possible that the students who received test B happened to be the smarter group of students.
b. Just because a test is easy doesn’t mean it isn’t measuring the knowledge of a student. It is easy to make an impossible-to-pass test; the test should be judged on whether it adequately measures the course requirements. If the course requirements themselves are easy, an easy test isn’t necessarily irrelevant.
c. The tests need to be considered in a variety of ways: compare scores to each other, compare scores to historical averages, etc. This will give a more accurate indication of whether each of the tests was too easy/hard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment