Long story, short: I'm totally open to supporting more rubies if possible. Details follow.
Related issue: http://code.google.com/p/logstash/issues/detail?id=37
Summary:
- core and stdlib ruby changes violently and without notice and without backwards compatibility. I want nothing of that.
- need a cross-ruby date library that isn't part of stdlib (see previous point) and is also good.
- need an easy way to use multiple cpus that is cross-ruby (threads are not it)
Details:
Mainly, the ruby core/stdlib API changes between ruby 1.8 and 1.9 are very poorly done. Some are documented while others are not. Some changes make sense, while others do not. That was the main reason for originally deciding to use jruby.
JRuby lets me use Java libraries in place of crappy ruby ones. For example, there are some undocumented changes to datetime between ruby 1.8 and 1.9, so the logstash 'date' filter uses Joda-Time instead of ruby's stdlib datetime.
Further, JRuby's performance options are currently much better than MRI or YARV. At worst, during benchmarks, JRuby performs on-par with YARV 1.9.2, but since JRuby has actual threads, we can use more cpus more easily, and pretty much beat plain ruby.
Additionally, java debugging tools are quite excellent. jvisualvm, jstack, etc.
Lastly, I can very easily ship a single 'executable' that should work on most platforms with java - see the monolithic jar logstash releases. I can't easily do this with other rubies.
There are some parts of logstash that explicitly require java currently - the date filter, elasticsearch support, and thread support.
The code is also only tested under ruby 1.8.7, and performance difference between JRuby and MRI 1.8.7 is pretty huge. It might get better if you try REE, but that's not really the same ruby everyone's going to have.
The date filter can be made ruby-friendly if someone write a non-crappy date parsing library in ruby. The ones that ship with stdlib are not fast or safe to use (ruby core changes it wildly without notice).
ElasticSearch support is much faster in jruby/jvm than it was using pure ruby, because we are now using the java APi for elasticsearch. Previously we were using the HTTP/REST api using EventMachine and em-http-request, which has much lower throughput.
Lastly, jruby supports proper threading so logstash can process events on multiple CPU cores. MRI and YARV Ruby cannot do this without forking and message passing.
The downsides to using JRuby are possibly higher in-memory footprint.
Again, I'm open to supporting non-JRuby rubies, but there needs to be answers for some of the above.
Thanks for an excellent blog on choosing jruby !!
I would like to pose a question : what are the limitations in implementing Logstash as a pure java solution?