This was all done on a rather nice iMac:
- Intel Core i7 Sandy-Bridge 4-core chip
- 12 GB of RAM
- 1TB Spinning-disk hard drive
- java version 1.6.0_37, build 1.6.0_37-b06-434-11M3909 "HotSpot" 64-bit VM "mixed mode" (whatever that's supposed to mean)
- The excellent pre-chewed testing app at https://github.com/klaustopher/hamlerbslim was used for great success
I ran the tests several times using Apache Benchmark. I spammed it a few times at first to warm up the VM - once the VM was warmed up, then I started dumping output into text files.
I was specifically interested in the performance of HAML vs SLIM vs ERB in the JRuby environment. I had been seeing some sluggishness on my TorqueBox application, and wondered what the cause was. I use HAML, but I kind of like Slim, too. So I was curious. Adding in ERB as a control just makes sense. I had also heard rumor of the HAML interpreter doing not-so-nice things to the JVM. Thankfully, that doesn't seem to be the case.
I ran each battery of tests twice: once on WEBRick server, and another time on Puma (for multi-threaded stuff - more representative of the JBoss AS environment I deploy to). But then I ran the tests on Puma again because they weren't running faster. Each individual request was running in approximately the same time - which is to be expected - but the concurrency wasn't right. With four workers the whole benchmark should run four times faster.
So I went into the application and added config.threadsafe!
to
config/environments/development.rb
- which solved the problem! The
tests ran again with high concurrency.
Why would I care about the concurrency? Well, Smotchkkiss, I'll tell you. With four times the amount of action taking place, it's worth while to observe the performance again because of how the garbage collector will behave. While processing one request after the other, that chip isn't running anywhere near saturation. So I upped the ante and increased the amount of work it was doing to try and choke it.
Also, I'll note that I also ran the tests against Puma with 32 threads, and a concurrency level of 32. This was a total failure! The context switching between threads brought the application from > 60ms response times to around 400ms response times. So, the moral of that story: over-threading your servers will make things worse, not better. You should have around as many threads as you have logical CPUs (and no, Hyper-Threading doesn't count - at least not in my testing! Running the tests at 16 threads was a failure, at 8 threads (4 cores * hyper-threading, right?) was similarly slow, but then I tried 4 threads and hit that magic sweet-spot).
The most interesting fields are the Time per request and Total Standard Deviation. I've arbitrarily selected the "best" test from each set. Because ASCII is awesome:
Puma on 4 threads (8 max) with Rails in threadsafe mode:
ERB HAML Slim
t/rq | 12.005ms | 14.248ms | 12.805ms |
StdDev | 2.4ms | 2.6ms | 2.4ms |
Puma on 4 threads (8 max) without Rails in threadsafe mode:
ERB HAML Slim
t/rq | 42.050ms | 50.643ms | 42.796ms |
StdDev | 2.2ms | 2.4ms | 2.3ms |
WEBRick on just one thread without Rails in threadsafe mode:
ERB HAML Slim
t/rq | 12.903ms | 15.114ms | 12.881ms |
StdDev | 1.2ms | 1.2ms | 1.1ms |
Additionally, remember the document size for each of the three templating variants. Also, this was run in development mode without any post-processing minification!
ERB HAML Slim
bytes | 7,590 | 10,095 | 6,086 |
Interestingly enough, when run against a production environment, the sizes are not all that much different for ERB, but HAML and Slim show significant changes in their payload!
ERB HAML Slim
bytes | 6,128 | 4,927 | 4,625 |
Which, of course, prompted me to run the tests again! Now, please understand that I'm a very lazy person, so I did not run the tests with WEBRick or non-concurrent Puma again. Those aren't representative of a production environment, therefore they're not really relevant.
So, what's the new numbers?
Puma on 4 threads (8 max) with Rails in thredsafe mode:
ERB HAML Slim
t/rq | 6.892ms | 7.562ms | 5.239ms |
StdDev | 1.3ms | 1.5ms | 1.7ms |
So the overall is kind of moot point, don't you think? In production mode, you'll save bandwidth by using either HAML or Slim (slightly more with Slim, but honestly, if you're that worried, run a minifier filter first, using some man-in-the-middle reverse proxy magic). You can get slighly faster times with Slim, and HAML is the slowest of the bunch. But their timings are so freaking close that it doesn't really make all the much difference.
I'd also note that this is just testing the speed of their ability to render. In the real world, fragment caching and database queries will be your main concerns. However, this proves that, when it comes to JRuby 1.7.0 and Rails, the templating language you pick isn't that big a performance consideration.
I hope you learned something - I know I did! Cheers!
-- NSError