Comparison of the performance of FFI vs XS zeromq bindings. For FFI the
ZMQ::FFI
bindings are used, first using FFI::Raw
on the backend and then
using FFI::Platypus
. For XS ZMQ::LibZMQ3
is used.
Comparison is done using the zeromq weather station example, first by timing
wuclient.pl using the various implementations, and then by profiling
wuserver.pl using Devel::NYTProf
. When profiling the server is changed to
simply publish 1 million messages and exit.
Weather station example code was lightly optimized (e.g. don't declare vars in loop) and modified to be more consistent.
Additionally, a more direct benchmark and comparison of FFI::Platypus
vs XS
xsubs is also done.
C and Python implementation results are provided as a baseline for performance.
All the code that was created or modified for these benchmarks is listed at the end (C/Python wuclient/wuserver code can be found in the zmq guide).
CPU: Intel Core Quad i7-2600K CPU @ 3.40GHz
Mem: 4GB
OS: Arch Linux
ZMQ: 4.0.5
Perl: 5.20.1
ZMQ::FFI = 0.19 (FFI::Raw backend), dev (FFI::Platypus backend)
FFI::Raw = 0.32
FFI::Platypus = 0.31
ZMQ::LibZMQ3 = 1.19
Thanks!
One of the other ideas for "optimization" is to generate C (not XS) code at runtime and compile it with FFI::TinyCC. I had assumed that wouldn't work because we need to parse the Perl header files (and TinyCC doesn't implement compiler directives that GCC does implement; but there's no easy/portable way to get Perl's build flags translated to another compiler), but it does! (I had to #define __builtin_expect, but that's hardly a major issue). And looking at the generated assembler code and benchmark results, it's good code and a little faster than ZMQ::LibZMQ3 (full results at https://gist.github.com/pipcet/1644cbd05e3300e5cec4#file-04-results-md)
What this demonstrates is that we can beat a real-world XS library in a non-real-world benchmarking situation, by using PERL_NO_GET_CONTEXT. So on threaded Perl, holy grail. Well, some coding still required, but I'm now convinced we can do it.