logstash, why jruby?

Long story, short: I'm totally open to supporting more rubies if possible. Details follow.

Summary:

core and stdlib ruby changes violently and without notice and without backwards compatibility. I want nothing of that.
need a cross-ruby date library that isn't part of stdlib (see previous point) and is also good.
need an easy way to use multiple cpus that is cross-ruby (threads are not it)

Details:

Mainly, the ruby core/stdlib API changes between ruby 1.8 and 1.9 are very poorly done. Some are documented while others are not. Some changes make sense, while others do not. That was the main reason for originally deciding to use jruby.

JRuby lets me use Java libraries in place of crappy ruby ones. For example, there are some undocumented changes to datetime between ruby 1.8 and 1.9, so the logstash 'date' filter uses Joda-Time instead of ruby's stdlib datetime.

Further, JRuby's performance options are currently much better than MRI or YARV. At worst, during benchmarks, JRuby performs on-par with YARV 1.9.2, but since JRuby has actual threads, we can use more cpus more easily, and pretty much beat plain ruby.

Additionally, java debugging tools are quite excellent. jvisualvm, jstack, etc.

Lastly, I can very easily ship a single 'executable' that should work on most platforms with java - see the monolithic jar logstash releases. I can't easily do this with other rubies.

There are some parts of logstash that explicitly require java currently - the date filter, elasticsearch support, and thread support.

The code is also only tested under ruby 1.8.7, and performance difference between JRuby and MRI 1.8.7 is pretty huge. It might get better if you try REE, but that's not really the same ruby everyone's going to have.

The date filter can be made ruby-friendly if someone write a non-crappy date parsing library in ruby. The ones that ship with stdlib are not fast or safe to use (ruby core changes it wildly without notice).

ElasticSearch support is much faster in jruby/jvm than it was using pure ruby, because we are now using the java APi for elasticsearch. Previously we were using the HTTP/REST api using EventMachine and em-http-request, which has much lower throughput.

Lastly, jruby supports proper threading so logstash can process events on multiple CPU cores. MRI and YARV Ruby cannot do this without forking and message passing.

The downsides to using JRuby are possibly higher in-memory footprint.

Again, I'm open to supporting non-JRuby rubies, but there needs to be answers for some of the above.

Sir! Thank you very much for publishing your reasoning. I too was wondering, why would anybody use JRuby, if natively-compiled Ruby is available. This post explains it.

However, I take an issue with one of your points:

Lastly, I can very easily ship a single 'executable' that should work on most platforms with java - see the monolithic jar logstash releases. I can't easily do this with other rubies.

I wish, you wouldn't do that -- providing other people's code, that is quite likely to already exist on the system, or be independently available. Thankfully, you don't bundle your own Java (some people do!), but you should not be providing your own JRuby JAR, nor the log4j, nor anything else, that's freely available from the 3rd-parties. Simply list the requirements (along with versions, if important) -- the way you require Java -- and have the packagers (be they FreeBSD ports-maintainers or RedHat RPM authors, or what have you) create the proper port/package for their respective OS.

By bundling the 3rd-party JARs and Ruby libraries, you simply increase the size of your distribution -- and only for the sake of "out of the box" readiness... The readiness, that, in my opinion, is rather superficial. One may use it as a proof of concept, but for a deployment across multiple systems, one would (or should!) create a package anyway...

jordansissel/Why JRuby.md

UnitedMarsupials-zz commented Jul 7, 2014

mhvenkat commented Jul 8, 2014

shurane commented Jul 31, 2014