Randomized load balancing comparison

RapGenius has an interesting post about Heroku's randomized load balancing, complaining about how random placement degrades performance compared to prior omniscient approaches. RapGenius ran some simulations, including an experiments with a "Choice of Two" method:

Choice of two routing is the naive method from before, with the twist that when you assign a request to a random dyno, if that dyno is already busy then you reassign the request to a second random dyno, with no regard for whether the second dyno is busy

This differs subtly but substantially from the standard "Power of Two Choices" randomized load balancing:

each [request] is placed in the least loaded of d >= 2 [Dynos] chosen independently and uniformly at random

Take a look at the difference in queue lengths below, for 200 Dynos, 1000 timesteps, and a constant 200 requests per timestep (assuming requests are fulfilled in one timestep):

  Strategy                             Average       99th Percentile
  Random                               16.87119      58.746
  RapGenius "Choice of Two"            9.063825      23.613
  Standard "Power of Two Choices"      3.734270      5.402

Note the 3x difference on average between the two "two choice" strategies (and an almost 5x difference at the 99th percentile).

There are all sorts of fun variants on "Power of Two Choices" and their theoretical implications. I'm hardly an expert, but I recall that the standard "Two Choices" gets you pretty close to optimal, and adding more choices doesn't help nearly as much as the first one. Of course, querying two dynos takes time (although the requests are in parallel), so you can play games like "query 5, check the first two responses." There's some cool work by my friends at Berkeley on doing this for cluster schedulers (no paper yet).

Quick and Dirty Python below:

  '''
  Compares randomized load balancing, RapGenius's "choice of two" and
  traditional "power of two" randomized load balancing for a fixed
  arrival rate (determined by the "LOAD" parameter)
  '''
  
  from random import randint
  
  TRIALS=1000
  DYNOS=200
  LOAD=1
  PCTILE=.99
  
  def average(l):
      return float(sum(l))/len(l)
  
  def randint_wo_replacement(minrange, maxrange, skip):
      ret = randint(minrange, maxrange)
      if ret >= skip:
          ret = ((ret+1) % maxrange)
      return ret
  
  for strategy in ["random", "rapgenius", "pwrtwo"]:
      queues=[0]*DYNOS
      avgs=[]
      pctiles=[]
  
      for trial in range(0, TRIALS):
          for dyno in range(0, DYNOS):
              queues[dyno] = max(queues[dyno]-1, 0)
          for request in range(0, int(DYNOS*LOAD)):
              if strategy == "rapgenius":
                  first_check = randint(0, DYNOS-1)
                  if(queues[first_check] == 0):
                      queues[first_check] += 1
                  else:
                      queues[randint_wo_replacement(0, DYNOS-1, first_check)] += 1
              elif strategy == "pwrtwo":
                  first_check = randint(0, DYNOS-1)
                  second_check = randint_wo_replacement(0, DYNOS-1, first_check)
                  if(queues[first_check] < queues[second_check]):
                      queues[first_check] += 1
                  else:
                      queues[second_check] += 1
              elif strategy == "random":
                  queues[randint(0, DYNOS-1)] += 1
  
          avgs.append(average(queues))
          #get 99th pctile
          queues.sort()
          pctiles.append(queues[int(DYNOS*PCTILE-1)])
      print "%s: avg %f, %f percentile %f" % (strategy, average(avgs), PCTILE, average(pctiles))

Hey Peter, thanks for this, and thanks for the suggestion about "power of two" as opposed to what we had called "choice of two." The choice of two thing was something that a reader suggested, but it's nice to know that power of two is an actual thing from real life experts!

Anyway, I updated our R code to take into account the choice of two and power of two approaches, you can see the gist here

I also updated some of the results on our original post, see here

It makes intuitive sense that choice of two and power of two would result in the same % of requests getting queued, because in either case, if at least one of the two dynos you pick is available, the request will get sent to a free dyno. Power of two is better though, because in the event that both of your selected dynos are busy, "choice" assigns the request randomly, whereas "power" assigns the request to the dyno that has fewer existing requested queued. Of course this isn't guaranteed to be the better dyno to choose, it's possible that dyno A has 2 longer requests queued that will take a total of 1000 ms to complete while dyno B has 5 shorter requests queued that will take only 500 ms to complete, but more often than not the "power of two" approach will select the correct dyno

Here are a graph and that shows average queue times for one of our simulations:

One thing I'd suggest for you to play with in your own simulation, you write:

assuming requests are fulfilled in one timestep

I'd strongly suggest tinkering with this condition. Based on my anecdotal experience, the results of the simulation are EXTREMELY sensitive to the tail end of your request time distribution. In fact the tail seems to be much more important than the average. We've been messing around using our empirical distribution of request times, and also a simulated Weibull distribution, both of which have long right tails, but when we adjust the parameters of that tail then we see drastically different results. Just something to keep in mind if you want to keep investigating this

Thanks again, and please let us know if you have other thoughts or suggestions!

pbailis/rapgenius-balancing.md

toddwschneider commented Feb 18, 2013

Uh oh!