Eric Bieschke, Chief Scientist @pandora
- Expresses great love for A/B testing
- "the big advantage about having a lot of data is that you can do experiments with real data, real users"
- how you judge experiments shapes where you are headed
- choose the wrong measuring stick and you wind up in the wrong place
- choose the right measuring stick and progress is inevitable
- improvements come from better hypotheses and better measuring sticks
- they originally looked at thumbs-up percentages as their measuring stick
- did a lot of incremental optimizations around that, but
- the optimizations were skewing for users on the web (not device users), people more likely to vote either way
- a better metric: total listening hours
- but the weakness of that is that is skews pathological music nerds and Pandora moves towards people who want to listen to tons of music
- events like major traffic delays in major cities make listenership go up, so not an accurate indicator of "addictiveness"
- Finalyy hit on listening return rate
- you want many users listening frequently on many days
The above metrics sit on top of "deeper metrics"
- relevance - how tight and how broad are specific stations for specific users
- prediction accuracy - how well did we recommend something
- musical diversity
- we want you to listen to 200 artists, not just 50
- novelty / surprise - how likely are we to hit you with something you never heard; if we have two choices Pandora will pick the one that you will least expect
- awesomeness - hard to measure
- An "ensemble" recommender
- they add and remove recommenders as they prove effective for a given user
- the more varied the given techniques the stronger the ensemble; orthogonality the vintage vinyl freak and the top 40 fan have very different profiles
- it's all about results; you need to have a goal in mind
Contains
- the music genome project
- collaborative filtering - when you have a lot of data, you don't need to get fancy with matrix factorization
- collective intelligence; reinfocement learning - our listeners know what they want; they use "stations" as a defacto search term