Learn just enough to be dangerous.
You're running an A/B test on a website. Version A of the page has been visited 100 times, and 9 of those times the user submitted a form. Version B of the page has been visited 100 times and 13 of those times a user submitted a form. Does version B have a higher conversion rate or is this just random chance?
In the next 20 minutes we're going to come back to this story problem and show the first principles behind how we solve it.
If we want to answer the story problem, we first assume that there's no difference between page A and page B. Then we see how likely it is by pure chance that page B would have this much higher of a click-through rate. The lower that probability, the more certain we are that Page B is in fact better than Page A.
When I start my A/B test, I don't know which version of the page I expect will have a higher conversion rate. So this means that when I see a difference in conversion rates, I have to check the possibility that the results are this lopsided in EITHER direction.
If you CANNOT anticipate ahead of time the direction of the effect, use a two sided test.
If you CAN anticipate the direction of the effect ahead of time, use a one sided test.
This is an important topic to dwell on. Let's say I'm testing whether Ozempic causes people to lose weight. At the beginning I'm going to assume one of two possibilities: A) The drug helps people lose weight, B) the drug has no effect. I'm not considering the possibility that the drug causes weight gain. Sure, anything is possible, but it's way more likely that a weight loss drug will help a patient lose 20 pounds than gain 20 pounds.
Brute Force Monte Carlo Simulation
A couple of points with this simulation:
If users have only visited each page 100 times, we're only 75% sure that page B actually has a higher click through rate than page A. If we increase the number of trials from 10,000 to 100,000 the results don't change. If we change the number of vists per page in the simulation, that DOES increase our certainty rate. In fact, how many visits per page would we need in order to know for sure?
Let's play around with the numbers.
page_visits | p-value (%) |
---|---|
100 | 49.89% |
200 | 34.19% |
300 | 24.97% |
400 | 18.55% |
500 | 13.69% |
600 | 10.33% |
700 | 7.86% |
800 | 6.03% |
900 | 4.70% |
1000 | 3.56% |
As more people on average visit the page, then our certainty gets higher and higher. In statitistics we generally want to want this p-value to be 5% or smaller. That number is just a convention.
There are too many statistical formulas to remember, some of which I'll get into with today's lightning talk. But today's computers are powerful enough that doing a brute force monte-carlo simulation is often the quickest way to get to the right answer.
Formulas are great shortcuts once you grasp the underlying concepts. If you don't know why the formula is the way it is, tinkering with parameters and seeing what happens is a great way to grasp the fundamental concepts. Here are a few:
- As we increase the number of page visits, the final p-value becomes smaller
- As we increase the number of trials, the final p-value doesn't become bigger or smaller, just more stable. This is the "law of large numbers" if you're curious.
- If we increase the difference in click through rates between pages A and B, we need fewer page visits in order to be confident in our assertion
- If the overall click-through rate is small for BOTH pages, we have to wait longer before we're certain of our conclusion
What's the thing I'm trying to prove? I want to prove that page B has a higher click through rate than page A. What's the opposite of that statement? In this case we assume page B and Page A have the same click through rate.
This is actually similar to the way testing and QA works. In order to show that your code is correct you first assume that the code is flawed and then you try to prove that the code is flawed. Once you've tried and failed to prove that the code is flawed, then you can assume that your code is correct.|
Bernoulli distribution is a fancy term for a coin flip. The coin doesn't have to be fair. It could be a coin that's heads 99% of the time and tails 1% of the time. Binomial distribution says "If you flip the coin X times, what's the probability that it's heads Y times." For example, "If you make 30% of your free throws and you take 20 free throws in a game, what's the probability you make exactly 9 of them?"
The probability mass function (PMF) of the binomial distribution is given by:
where:
- ( n ) is the number of trials,
- ( k ) is the number of successes,
- ( p ) is the probability of success in each trial
Binomial Distribution In Pure JS
Imagine the following scenario among basketball players:
- Player 1 makes 50% of their free throws. How many shots do they make on average if they take 10 shots? (Answer: 5)
- Player 2 makes 25% of their free throws. How many shots do they make on average if they take 20 shots? (Answer: 5)
- Player 3 makes 1% of their free throws. How many shots do they make on average if they take 500 shots? (Answer: 5)
Imagine this sequence continues. Each successive player makes a smaller percentage of their free throws but takes more shots so that they make 5 shots on average. As the number of shots approaches infinity, this is the Poisson distribution.
The Poisson distribution is a great approximation for rare events that occur at random. Examples:
- Car crashes (rare for any one driver, but there are a lot of drivers in any given city)
- Server outages
- Click-through rates on pages if the rate is low
- Throw in some examples with j-stat here
- Also encourage users to explore the vast world outside javascript
- (Google Sheets, Microsoft Excel)
Details TBA