Created
December 22, 2015 00:53
-
-
Save mythmon/7249c6c99066f663f7af to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
future ideas | |
* rule previewer | |
* I am a user in CANADA I am the FOURTH user. What do I get? | |
Notes from meeting | |
* no user should get the same question twice | |
* this could be done on the client | |
* or we could just never send the payload to the same user twice, providing a set collection primitive. | |
* we need to record how many users take each action from the responses | |
* I don't know if this is our part or something in heartbeat/input | |
* slow ramp up is very important | |
* I think customizable small bundles will be a good way forward instead of giant mega bundles. | |
* opt out | |
* i think can we do percentage instead of counts? that makes CDN stuff easier. | |
* privacy is super important | |
* "intelligently tune" | |
* there is a value exchange here. there is a tradeoff of magic vs creepiness. | |
Will told me | |
* heartbeat uses counts to make sure not to "use up" the population, since | |
we only get O(number of users) per year. | |
* Unclear why self-repair needs this. | |
* ideatown is like gmail labs - opt in for weirdness to get more data. | |
* to learn more about counts for self-repeat | |
* matt grimes | |
* gregg lindd | |
* (michael verdi) | |
Talked to Matt Grimes about counts | |
* Mostly for getting statistical significance | |
* Use percentage to control rate at which the limit is hit | |
* Close enough is fine, as long as the number if known. | |
* Probably err on the side of overshooting | |
* Also want to be able to guess how long it will take to get to the limit | |
* Probably guesses based on historic traffic and percentage of users selected | |
Architecture | |
* Name: Normandy | |
* Architecture (hard mode) | |
* Editor | |
* Django | |
* Writes to database to share with other components | |
* Server (need better name for this component) | |
* Maybe Django? | |
* If it is fast enough | |
* if not, Rust? :D | |
* Read only (?) | |
* Loads everything it needs into memory on boot, serves from that | |
* Reload data by restarting process (or maybe SIGHUP?) | |
* Has to handle ~300million hits / day = 3500/sec minimum | |
* Probably target peak of ~double that (7000/sec) | |
* 10 ms/request = 70 server processes | |
* 4 procs/box = ~18 boxes | |
* 16 procs/box = ~5 boxes | |
* Can probably do this in on DC. | |
* 50 ms/request = 350 server processes | |
* 4 procs/box = 88 boxes | |
* 16 procs/box = 22 boxes | |
* This needs multi DC I bet. | |
* Kitsune is at about | |
* 70ms/req including databases. | |
* 30ms/req for just python. | |
* Features | |
* Make a bundle of code | |
* Serve it to users following certain rules | |
* Rules are things like | |
* date ranges | |
* a certain count of users | |
* a particular percentage of users | |
* particular countries | |
* "Deploy additional payload P to (compositions of other rules)" | |
* Rules need to be eventually able to be generated by non-programmers | |
* v2 goal of a UI to build them | |
* It would be awesome if they are static and not turing complete. | |
* Scaling considerations | |
* Reading bundles has to be cheap (and so fast) | |
* But the admin can be "slow", and it can take a while to take effect | |
* Idea: | |
* Rules and bundles | |
* Single data center writable service for the editor | |
* Slow sync to cluster management system | |
* Reboot immutable server processes to win | |
* Store in github? | |
* This gets us code review for free | |
* Version control of code | |
* Means interacting with the github api. that is probably fine | |
* the bundles will probably be in git anyways for someone | |
* Store config rules there too? | |
* How does this interact with immutable deploy/data store | |
* Series of items | |
* Each item defines some rules and some bundles | |
* If rules match, send all bundles. | |
* Stop processing after one. | |
* Bundles defined separately, referenced from items/rules. | |
* This way bundles can be de-duped and used in many rules. | |
* Counter | |
* A second service | |
* this is the Geo distributed bit | |
* This needs to be fast to read, but can be slow to write | |
* Paxos? Raft? Idk, those things. | |
* Do we need unique counters? or just counters? | |
* just counters: easy, just count in the data store | |
* sysadmins tell me just use RDS, probably | |
* uniques: | |
* how high does this need to count? how many uniques? | |
* how many counters will there be? what dimensionality? | |
* Can we expire them eventually? | |
* scale? number? | |
* easy: hashmap? | |
* How is this synced? | |
* medium: bloom filters | |
* Sync a couple "numbers" - count and bloom bitmap. | |
* Are growing bloom filters a pain? | |
* hard: hyperloglog | |
* store a bunch of numbers, but state storage is harder. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
german translations: | |
predicates: | |
{type: language, match: de} | |
bundles: | |
translations/de.js | |
spanish translations: | |
predicates: | |
{type: language, match: es} | |
bundles: | |
translations/es.js | |
experiment 1 | |
predicates: | |
{type: daterange, start: 2015-12-11T09:55:36Z, end: 2015-12-18T09:55:36Z} | |
{type: language, match: en} | |
{type: country, match: canada} | |
{type: sample, rate: 0.05} | |
{type: count, limit: 10000} | |
bundles: | |
experiments/exp1.js | |
default | |
predicates | |
bundles: | |
default.js | |
===================== | |
16 possible bundles in naive case | |
O(2^n) | |
n=20, == ~million | |
6 observed bundles | |
n = l + e | |
O(l*2^e)) | |
l=5, e=15, == 160K | |
l=5, e=10, == 5000 | |
- translations/de.js, experiments/exp1.js, default.js | |
- translations/es.js, experiments/exp1.js, default.js | |
- experiments/exp1.js, default.js | |
- translations/de.js, default.js | |
- translations/es.js, default.js | |
- default.js | |
3 experiment only bundles | |
- german, mega (+bitmask) | |
- spanish, mega (+bitmask) | |
- mega (+bitmask) | |
l=5, e=15, == 5 | |
============================ | |
Salena says: | |
* Postgres might not be able to handle this load for counters | |
* Or rather, it might have bad effects on the users | |
* Postgres works by lock, update, write. | |
* No CAS-operations | |
* Locks could delay users, hurts throughput. | |
* But Postgres has hll, which might work | |
* Still need locking stuff | |
* need pgbouncer | |
* Selena can help us set this up once we get to that point | |
* Talk to Jonas Finnemann Jensen about this | |
* After the new year, set up a 1 hour chat about it. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment