Things we currently do to set up an A/B test as an engineer

Planning

Review the tasks
- Understand what is being asked to be tracked and why
  - Analyze feasibility of implementation provide feedback and ask questions with Product and Analysts
Flesh out the technical information (see Development) of such tasks
Ensure there are separate tasks created for implementation of tracking, and deployment of tracking

Development

Ensure both variants are accessible to users at the same time
- If they aren't, add code paths to be able to have the two variants working at the same time
Create configuration variables for the experiment with the buckets
- Name the configuration variables and give them default values
- If needed, read the config on the server to ship or not certain code bundles to the clients
- Read the config in the client
Create user bucketing
- If needed, write down feature specific user targeting code (anon, logged in, etc.) May be based on more configuration flags (per-wiki configs)
- Configure mw.experiments with the test config to bucket user
- Configure variants appropriately depending on the bucket
Create EventLogging schema in wikitech
- Document all fields
- Review with analyst
- Add configuration for schema name and version into code base
Add test specific instrumentation to the variants
- Code event sending for the specific events needed to the schema
  - For the different code paths
  - For the different buckets
  - With the necessary for each type of user or event
Sampling - if needed (it usually is)
- Add configuration for event logging sampling rate
- If sampling is a probability over individual event, configure sending events
- If sampling is something more complex (it usually is), like receiving events for all pages in a session, or per page, or something else
  - Code up event sampling by specific factors, and connect it to the event sending

Testing

Run tests for the event logging (the verification step is complex because of configuration needs and how events can be checked). So, manually run through each scenario, for all combinations of:

test variants · user buckets · sampling mechanisms · event sent scenario

Starting an A/B test

For each wiki we want to deploy (some times for wikis of similar size this can be done in a group)

Review A/B test start task and ensure all information is there
- to which wikis
- when
- if there is a sampling rate needed for event logging
- if the experiment config is well specified and makes sense
Schedule a SWAT deploy and make the patch with the config change
- Test the config change locally
Deploy a config change for the needed wiki(s) enabling the experiment
- Test and verify the experiment is on and live
Update the Schema talk page with the details of the sampling rate that was deployed to which wiki

Follow-up

Tasks need to be created for the disabling the test at a certain date
We track event logging error rates for validation
Analyst verifies data reception is solid, if there are bugs, back to Development
Analyst verifies event volume is consistent with expectations

Stopping an A/B test

Schedule a SWAT deploy and make the patch with the config change
- Test the config change locally
Deploy a config change for the needed wiki(s) disabling the experiment
- Test and verify the experiment is off and data collection is off
Update the Schema talk page with the details of the sampling rate that was deployed to which wiki

Sunsetting

When all A/B tests have been run, and no more are expected, we keep an eye to add the sunsetting tasks to get rid of the A/B testing code and old variants. Unused code is a liability and creates maintenance problems.

Create task to get rid of A/B tests implementation
- Flesh it out with as many details of what needs to be removed
  - Config variables and their usages and reading
  - Bucketing logic and experiment configuration
  - If the event logging is not going to be needed
    - Event sampling and event sending on the different code paths
Create task to get rid of unused variant
- Remove variant code path and tests
Test everything works after all these removals

"Understand what is being asked to be tracked and why"
specifically what is the question or question__s__ you want answered and what is the most simple way we can find that answer. https://www.mediawiki.org/wiki/Extension:EventLogging/Guide#Posing_a_question

Also the "who" - anons? logged in? Editors with a certain edit count?

https://www.mediawiki.org/wiki/Extension:EventLogging/Guide is a great resource.

"Ensure both variants are accessible to users at the same time"
Might want to add something around caching. e.g. If the A/B test changes HTML for anonymous users this creates challenges with flash of unstyled content.

"Create EventLogging schema in wikitech"
It's actually meta.wikimedia.org
Ideally this should also be done upfront as part of analysis before any development occurs. Not during development. We should be better ourselves on that front!

"Starting an A/B test"
I don't know if feasible but ideally I'd like to see us do dummy A/B tests where we generate data for say a 2 day period and then vet that data. As we discovered with page previews there is potential for issues that can evade QA and disrupt data quality (e.g. browser bug duplicates/issues with sampling).

joakin/setting-up-an-ab-test.md

Things we currently do to set up an A/B test as an engineer

Planning

Development

Testing

Starting an A/B test

Follow-up

Stopping an A/B test

Sunsetting

niedzielski commented Apr 24, 2018

Uh oh!

jdlrobson commented Apr 25, 2018

Uh oh!

phuedx commented Jun 7, 2018 •

edited

Loading

Uh oh!

joakin/setting-up-an-ab-test.md

Things we currently do to set up an A/B test as an engineer

Planning

Development

Testing

Starting an A/B test

Follow-up

Stopping an A/B test

Sunsetting

niedzielski commented Apr 24, 2018

Uh oh!

jdlrobson commented Apr 25, 2018

Uh oh!

phuedx commented Jun 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phuedx commented Jun 7, 2018 •

edited

Loading