The point I was trying to make is that the differences look large because it's a hello world. I understand that you're benchmarking the overhead of the servers, but that doesn't mean anything until you compare the overhead to how much time the app itself needs.
The benefit of benchmarking the network is:
- The overhead of the benchmarking program itself is not affecting the results.
- It gives you a more realistic picture, because in production environments the network delays are significant. In fact, they should be so significant that the differences between the app servers should be greatly reduced.
Therefore, using Vagrant and VMs on the local machine will not work. You really have to benchmark over a network, using another computer. In my test I grabbed another laptop. You might want to spin up an EC2 instance.
A while ago we benchmarked Unicorn vs Passenger 4 over the network. The performance is almost identical: https://code.google.com/p/phusion-passenger/issues/detail?id=956#c4
But I'm curious: what server configurations did you use? It matters a lot.
Maybe you can test benchmarking from one EC2 to another EC2 server? Or maybe from one laptop to another laptop connected with an ethernet cable. That way you should not be bandwidth constrained.