RapGenius has an interesting post about Heroku's randomized load balancing, complaining about how random placement degrades performance compared to prior omniscient approaches. RapGenius ran some simulations, including an experiments with a "Choice of Two" method:
Choice of two routing is the naive method from before, with the twist that when you assign a request to a random dyno, if that dyno is already busy then you reassign the request to a second random dyno, with no regard for whether the second dyno is busy
This differs subtly but substantially from the standard "Power of Two Choices" randomized load balancing:
each [request] is placed in the least loaded of d >= 2 [Dynos] chosen independently and uniformly at random
Take a look at the difference in queue lengths below, for 200 Dynos, 1000 timesteps, and a constant 200 requests per timestep (assuming requests are fulfilled in one timestep):
Strategy Average 99th Percentile
Random 16.87119 58.746
RapGenius "Choice of Two" 9.063825 23.613
Standard "Power of Two Choices" 3.734270 5.402
Note the 3x difference on average between the two "two choice" strategies (and an almost 5x difference at the 99th percentile).
There are all sorts of fun variants on "Power of Two Choices" and their theoretical implications. I'm hardly an expert, but I recall that the standard "Two Choices" gets you pretty close to optimal, and adding more choices doesn't help nearly as much as the first one. Of course, querying two dynos takes time (although the requests are in parallel), so you can play games like "query 5, check the first two responses." There's some cool work by my friends at Berkeley on doing this for cluster schedulers (no paper yet).
Quick and Dirty Python below:
'''
Compares randomized load balancing, RapGenius's "choice of two" and
traditional "power of two" randomized load balancing for a fixed
arrival rate (determined by the "LOAD" parameter)
'''
from random import randint
TRIALS=1000
DYNOS=200
LOAD=1
PCTILE=.99
def average(l):
return float(sum(l))/len(l)
def randint_wo_replacement(minrange, maxrange, skip):
ret = randint(minrange, maxrange)
if ret >= skip:
ret = ((ret+1) % maxrange)
return ret
for strategy in ["random", "rapgenius", "pwrtwo"]:
queues=[0]*DYNOS
avgs=[]
pctiles=[]
for trial in range(0, TRIALS):
for dyno in range(0, DYNOS):
queues[dyno] = max(queues[dyno]-1, 0)
for request in range(0, int(DYNOS*LOAD)):
if strategy == "rapgenius":
first_check = randint(0, DYNOS-1)
if(queues[first_check] == 0):
queues[first_check] += 1
else:
queues[randint_wo_replacement(0, DYNOS-1, first_check)] += 1
elif strategy == "pwrtwo":
first_check = randint(0, DYNOS-1)
second_check = randint_wo_replacement(0, DYNOS-1, first_check)
if(queues[first_check] < queues[second_check]):
queues[first_check] += 1
else:
queues[second_check] += 1
elif strategy == "random":
queues[randint(0, DYNOS-1)] += 1
avgs.append(average(queues))
#get 99th pctile
queues.sort()
pctiles.append(queues[int(DYNOS*PCTILE-1)])
print "%s: avg %f, %f percentile %f" % (strategy, average(avgs), PCTILE, average(pctiles))
Hey Peter, thanks for this, and thanks for the suggestion about "power of two" as opposed to what we had called "choice of two." The choice of two thing was something that a reader suggested, but it's nice to know that power of two is an actual thing from real life experts!
Anyway, I updated our R code to take into account the choice of two and power of two approaches, you can see the gist here
I also updated some of the results on our original post, see here
It makes intuitive sense that choice of two and power of two would result in the same % of requests getting queued, because in either case, if at least one of the two dynos you pick is available, the request will get sent to a free dyno. Power of two is better though, because in the event that both of your selected dynos are busy, "choice" assigns the request randomly, whereas "power" assigns the request to the dyno that has fewer existing requested queued. Of course this isn't guaranteed to be the better dyno to choose, it's possible that dyno A has 2 longer requests queued that will take a total of 1000 ms to complete while dyno B has 5 shorter requests queued that will take only 500 ms to complete, but more often than not the "power of two" approach will select the correct dyno
Here are a graph and that shows average queue times for one of our simulations:
One thing I'd suggest for you to play with in your own simulation, you write:
I'd strongly suggest tinkering with this condition. Based on my anecdotal experience, the results of the simulation are EXTREMELY sensitive to the tail end of your request time distribution. In fact the tail seems to be much more important than the average. We've been messing around using our empirical distribution of request times, and also a simulated Weibull distribution, both of which have long right tails, but when we adjust the parameters of that tail then we see drastically different results. Just something to keep in mind if you want to keep investigating this
Thanks again, and please let us know if you have other thoughts or suggestions!