Skip to content

Instantly share code, notes, and snippets.

@pipcet
Forked from calid/00-preamble.md
Last active August 29, 2015 14:16
Show Gist options
  • Save pipcet/1644cbd05e3300e5cec4 to your computer and use it in GitHub Desktop.
Save pipcet/1644cbd05e3300e5cec4 to your computer and use it in GitHub Desktop.
language: perl
before_install:
- sudo apt-get install libzmq3-dev
- sudo apt-get install gcc
install:
- cpanm -n ZMQ::LibZMQ3
- cpanm -n ZMQ::FFI
- cpanm -n Benchmark
- cpanm -n Inline::C
- cpanm -n FFI::TinyCC
- cpanm -n ExtUtils::Embed
- cpanm -n Scalar::Util
- cpanm -n Data::Dumper
- cpanm -n Devel::PPPort
- cpanm -n Carp::Always
perl:
- "5.10"
- "5.20"
script:
- git clone https://github.com/pipcet/FFI-Platypus
- cd FFI-Platypus; perl ./Build.PL && perl ./Build && perl ./Build install; cd ..
- FFI_PLATYPUS_VERSION="0.31" perl ./zmq-bench.pl --test

ØMQ Perl Performance Comparison: FFI vs XS bindings

Comparison of the performance of FFI vs XS zeromq bindings. For FFI the ZMQ::FFI bindings are used, first using FFI::Raw on the backend and then using FFI::Platypus. For XS ZMQ::LibZMQ3 is used.

Comparison is done using the zeromq weather station example, first by timing wuclient.pl using the various implementations, and then by profiling wuserver.pl using Devel::NYTProf. When profiling the server is changed to simply publish 1 million messages and exit.

Weather station example code was lightly optimized (e.g. don't declare vars in loop) and modified to be more consistent.

Additionally, a more direct benchmark and comparison of FFI::Platypus vs XS xsubs is also done.

C and Python implementation results are provided as a baseline for performance.

All the code that was created or modified for these benchmarks is listed at the end (C/Python wuclient/wuserver code can be found in the zmq guide).

Test box

CPU:  Intel Core Quad i7-2600K CPU @ 3.40GHz
Mem:  4GB
OS:   Arch Linux
ZMQ:  4.0.5
Perl: 5.20.1

ZMQ::FFI      = 0.19 (FFI::Raw backend), dev (FFI::Platypus backend)
FFI::Raw      = 0.32
FFI::Platypus = 0.31
ZMQ::LibZMQ3  = 1.19

wuclient.pl Time Comparison

FFI::Raw Implementation

$ perl wuserver.pl &
$ time perl wuclient.pl
Collecting updates from weather station...
Average temperature for zipcode '10001 ' was 21F

real    1m22.818s
user    0m0.070s
sys     0m0.023s

FFI::Platypus Implementation

$ perl wuserver.pl &
$ time perl wuclient.pl
Collecting updates from weather station...
Average temperature for zipcode '10001 ' was 38F

real    0m12.813s
user    0m0.083s
sys     0m0.033s

XS Implementation (ZMQ::LibZMQ3)

$ perl wuserver.pl &
$ time perl wuclient.pl
Collecting updates from weather server...
Average temperature for zipcode '10001 ' was 34F

real    0m10.051s
user    0m0.017s
sys     0m0.010s

C Reference Implementation

$ ./wuserver &
$ time ./wuclient
Collecting updates from weather server...
Average temperature for zipcode '10001 ' was 26F

real    0m2.842s
user    0m0.000s
sys     0m0.023s

Python Reference Implementation

I was initially impressed with the performance of the Python example:

$ python -V
Python 3.4.2
$ python -c 'import zmq; print(zmq.pyzmq_version())'
14.5.0

$ python wuserver.py &
$ time python wuclient.py
Collecting updates from weather server...
Average temperature for zipcode '10001' was 49F

real    0m4.599s
user    0m0.063s
sys     0m0.020s

Wow, that's almost as fast as C! But then I noticed:

# Process 5 updates
total_temp = 0
for update_nbr in range(5)
    ...

So where the C and Perl implementations are processing 100 updates, the Python version only processes 5, or 1/20 as many. What about if we use 100 updates like the other languages?

$ python wuserver.py &
$ time python wuclient.py
Collecting updates from weather server...
Average temperature for zipcode '10001' was 17F

real    1m41.108s
user    0m0.077s
sys     0m0.017s

If nothing else, at least the Perl bindings blow the doors off the Python ones :)

wuserver.pl Hot Spot Comparison (Devel::NYTProf)

FFI::Raw Implementation

$self->_zmq3_ffi->{zmq_send}->($self->_socket, $msg, $length, $flags)
# spent 19.9s making 1000000 calls to FFI::Raw::__ANON__[FFI/Raw.pm:94], avg 20µs/call
# spent 5.72s making 2000000 calls to FFI::Raw::coderef, avg 3µs/call
# spent 2.90s making 1000000 calls to ZMQ::FFI::ZMQ3::Socket::_zmq3_ffi, avg 3µs/call

FFI::Platypus Implementation

zmq_send($socket, $msg, $length, $flags)
# spent 1.33s making 1000000 calls to ZMQ::FFI::ZMQ3::Socket::zmq_send, avg 1µs/call

sub ZMQ::FFI::ZMQ3::Socket::zmq_send; # xsub

XS Implementation (ZMQ::LibZMQ3)

zmq_send($socket, $string, -1);
# spent 1.23s making 1000000 calls to ZMQ::LibZMQ3::zmq_send, avg 1µs/call

sub ZMQ::LibZMQ3::zmq_send; # xsub

Direct xsub Comparison

The weather station example inevitably has layers between sending the messages and the underlying xsub calls. This is fine for comparing the two high level APIs ZMQ::FFI vs ZMQ::LibZMQ3, but we also want to compare the FFI::Platypus vs XS xsub performance directly.

So as much as possible strip out intervening layers to determine the raw performance of the two.

Benchmark.pm results

$ perl zmq-bench.pl
FFI ZMQ Version: 4.0.5
XS  ZMQ Version: 4.0.5

Benchmark: timing 10000000 iterations of FFI, XS...
       FFI:  4 wallclock secs ( 3.31 usr +  0.01 sys =  3.32 CPU) @ 3012048.19/s (n=10000000)
        XS:  2 wallclock secs ( 2.16 usr +  0.00 sys =  2.16 CPU) @ 4629629.63/s (n=10000000)

         Rate   FFI    XS     C
FFI 3012048/s    --  -35%  -82%
XS  4629630/s   54%    --  -73%
C* 16835017/s  559%  364%    --

*just 'faking' the C results below into the table so it's easy to compare a baseline

$ time zmq-bench-c
C ZMQ Version: 4.0.5

real    0m0.594s
user    0m0.570s
sys     0m0.017s

$ echo '10000000 / 0.594' | bc -lq
16835016.835 # Rate

Devel::NYTProf profiling results

For profiling and timing in the shell below send in a for loop instead of via Benchmark

sub main::zmqffi_send; # xsub
# spent 15.5s within main::zmqffi_send which was called 10000000 times, avg 2µs/call

sub ZMQ::LibZMQ3::zmq_send; # xsub
# spent 15.6s within ZMQ::LibZMQ3::zmq_send which was called 10000000 times, avg 2µs/call

Q: Why does the profiler indicate basically identical performance of the xsubs, but Benchmark reports performance difference?

A: ???

Time in shell

$ time perl zmq-bench.pl
FFI ZMQ Version: 4.0.5

real    0m3.541s
user    0m3.510s
sys     0m0.027s

$ echo '10000000 / 3.541' | bc -lq
2824060.999 # Rate

$ time perl zmq-bench.pl
XS ZMQ Version: 4.0.5

real    0m2.390s
user    0m2.363s
sys     0m0.020s

$ echo '10000000 / 2.390' | bc -lq
4184100.418 # Rate

XS is 48% faster when timing on the shell.

Results

2015-03-12

		       Rate method xsub(hash) method (2) method (3) TinyCC method xsub TinyCC method (2) TinyCC method (3) Inline method   XS Inline method (2) Inline method (3) Inline xsub TinyCC xsub
method            1811594/s     --        -1%        -9%        -9%          -10% -15%              -18%              -18%          -22% -27%              -29%              -32%        -36%        -36%
xsub(hash)        1831502/s     1%         --        -8%        -8%           -9% -14%              -17%              -17%          -21% -27%              -28%              -31%        -35%        -36%
method (2)        1996008/s    10%         9%         --        -0%           -1%  -7%              -10%              -10%          -14% -20%              -22%              -25%        -29%        -30%
method (3)        2000000/s    10%         9%         0%         --           -1%  -6%               -9%              -10%          -14% -20%              -21%              -25%        -29%        -30%
TinyCC method     2012072/s    11%        10%         1%         1%            --  -6%               -9%               -9%          -14% -19%              -21%              -25%        -29%        -29%
xsub              2136752/s    18%        17%         7%         7%            6%   --               -3%               -4%           -8% -14%              -16%              -20%        -24%        -25%
TinyCC method (2) 2207506/s    22%        21%        11%        10%           10%   3%                --               -0%           -5% -11%              -13%              -17%        -22%        -23%
TinyCC method (3) 2217295/s    22%        21%        11%        11%           10%   4%                0%                --           -5% -11%              -13%              -17%        -21%        -22%
Inline method     2331002/s    29%        27%        17%        17%           16%   9%                6%                5%            --  -7%               -8%              -13%        -17%        -18%
XS                2493766/s    38%        36%        25%        25%           24%  17%               13%               12%            7%   --               -2%               -6%        -11%        -12%
Inline method (2) 2544529/s    40%        39%        27%        27%           26%  19%               15%               15%            9%   2%                --               -5%        -10%        -11%
Inline method (3) 2666667/s    47%        46%        34%        33%           33%  25%               21%               20%           14%   7%                5%                --         -5%         -6%
Inline xsub       2816901/s    55%        54%        41%        41%           40%  32%               28%               27%           21%  13%               11%                6%          --         -1%
TinyCC xsub       2849003/s    57%        56%        43%        42%           42%  33%               29%               28%           22%  14%               12%                7%          1%          --

Comments:

  • method is what I'd actually use today. It uses the attach_method feature to cache a raw pointer for a Perl object without going through its hash, and it's nearly as fast as using an XSUB and going through a hash lookup (xsub(hash)). method (2) and method (3) are variants that don't go through Perl's method resolution mechanism.
  • TinyCC method is what we would get if we JIT-compiled C code. It's quite a bit faster than method, and I believe can be said to be faster than xsub based on variant (3).
  • Inline method is the same thing with Inline::C and very aggressive optimization. I think the gain is not worth it in this case, because Inline has a huge run-time overhead and caches results based only on the C code, not the compiler options or machine type; thus, it is very easy to end up with an _Inline directory that doesn't run on the machine it's on.
  • XS is the vanilla LibZMQ3 XS code. It's not been optimized further, which reflects reality.
  • TinyCC xsub is a JIT-generated C XSUB compiled with TinyCC. Going through Inline and enabling aggressive optimization turns out not to make a difference for this very short piece of code.

2015-03-15

			     Rate method Python method (3) method (2) Perl exec, XS based Perl exec, FFI based xsub(hash) Inline(GCC) method TinyCC method Inline(GCC) method (3) Inline(GCC) method (2)   XS TinyCC method (3) TinyCC method (2) xsub TinyCC xsub Inline(GCC) xsub Inline(GCC) TinyCC
method                  1394700/s     --    -6%        -7%        -7%                -35%                 -38%       -39%               -40%          -43%                   -45%                   -47% -48%              -48%              -49% -50%        -51%             -54%        -87%   -87%
Python                  1483680/s     6%     --        -1%        -1%                -31%                 -34%       -35%               -36%          -39%                   -42%                   -43% -45%              -45%              -46% -47%        -47%             -51%        -86%   -86%
method (3)              1494768/s     7%     1%         --        -0%                -31%                 -33%       -35%               -35%          -39%                   -41%                   -43% -44%              -45%              -45% -47%        -47%             -51%        -86%   -86%
method (2)              1499250/s     7%     1%         0%         --                -30%                 -33%       -35%               -35%          -39%                   -41%                   -43% -44%              -44%              -45% -47%        -47%             -50%        -86%   -86%
Perl exec, XS based     2155172/s    55%    45%        44%        44%                  --                  -4%        -6%                -7%          -12%                   -16%                   -18% -19%              -20%              -21% -23%        -24%             -29%        -79%   -80%
Perl exec, FFI based    2247191/s    61%    51%        50%        50%                  4%                   --        -2%                -3%           -8%                   -12%                   -14% -16%              -17%              -18% -20%        -20%             -26%        -78%   -79%
xsub(hash)              2298851/s    65%    55%        54%        53%                  7%                   2%         --                -1%           -6%                   -10%                   -12% -14%              -15%              -16% -18%        -19%             -24%        -78%   -78%
Inline(GCC) method      2314815/s    66%    56%        55%        54%                  7%                   3%         1%                 --           -5%                    -9%                   -12% -13%              -14%              -15% -18%        -18%             -23%        -78%   -78%
TinyCC method           2444988/s    75%    65%        64%        63%                 13%                   9%         6%                 6%            --                    -4%                    -7%  -9%               -9%              -10% -13%        -13%             -19%        -77%   -77%
Inline(GCC) method (3)  2551020/s    83%    72%        71%        70%                 18%                  14%        11%                10%            4%                     --                    -3%  -5%               -5%               -6%  -9%        -10%             -16%        -76%   -76%
Inline(GCC) method (2)  2624672/s    88%    77%        76%        75%                 22%                  17%        14%                13%            7%                     3%                     --  -2%               -3%               -4%  -7%         -7%             -13%        -75%   -75%
XS                      2673797/s    92%    80%        79%        78%                 24%                  19%        16%                16%            9%                     5%                     2%   --               -1%               -2%  -5%         -5%             -11%        -74%   -75%
TinyCC method (3)       2695418/s    93%    82%        80%        80%                 25%                  20%        17%                16%           10%                     6%                     3%   1%                --               -1%  -4%         -5%             -11%        -74%   -74%
TinyCC method (2)       2724796/s    95%    84%        82%        82%                 26%                  21%        19%                18%           11%                     7%                     4%   2%                1%                --  -3%         -4%             -10%        -74%   -74%
xsub                    2816901/s   102%    90%        88%        88%                 31%                  25%        23%                22%           15%                    10%                     7%   5%                5%                3%   --         -0%              -7%        -73%   -73%
TinyCC xsub             2824859/s   103%    90%        89%        88%                 31%                  26%        23%                22%           16%                    11%                     8%   6%                5%                4%   0%          --              -6%        -73%   -73%
Inline(GCC) xsub        3021148/s   117%   104%       102%       102%                 40%                  34%        31%                31%           24%                    18%                    15%  13%               12%               11%   7%          7%               --        -71%   -71%
Inline(GCC)            10416667/s   647%   602%       597%       595%                383%                 364%       353%               350%          326%                   308%                   297% 290%              286%              282% 270%        269%             245%          --    -1%
TinyCC                 10526316/s   655%   609%       604%       602%                388%                 368%       358%               355%          331%                   313%                   301% 294%              291%              286% 274%        273%             248%          1%     --

00-preamble.md

oprofile_data: zmq-bench.pl
operf --callgraph perl ./zmq-bench.pl
call-graph: oprofile_data
opreport --callgraph > call-graph
nytprof.out: zmq-bench.pl
perl -d:NYTProf ./zmq-bench.pl
opannotate.c: oprofile_data
opannotate --source > opannotate.c
bench.txt: zmq-bench.pl
perl ./zmq-bench.pl > bench.txt
#
# Weather update client
# Connects SUB socket to tcp://localhost:5556
# Collects weather updates and finds avg temp in zipcode
#
import sys
import zmq
# Socket to talk to server
context = zmq.Context()
socket = context.socket(zmq.SUB)
print("Collecting updates from weather server...")
socket.connect("tcp://localhost:5556")
# Subscribe to zipcode, default is NYC, 10001
zip_filter = sys.argv[1] if len(sys.argv) > 1 else "10001"
# Python 2 - ascii bytes to unicode str
if isinstance(zip_filter, bytes):
zip_filter = zip_filter.decode('ascii')
socket.setsockopt_string(zmq.SUBSCRIBE, zip_filter)
# Process 5 updates
total_temp = 0
for update_nbr in range(5):
string = socket.recv_string()
zipcode, temperature, relhumidity = string.split()
total_temp += int(temperature)
print("Average temperature for zipcode '%s' was %dF" % (
zip_filter, total_temp / update_nbr)
)
#
# Weather update server
# Binds PUB socket to tcp://*:5556
# Publishes random weather updates
#
import zmq
from random import randrange
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:5556")
while True:
zipcode = randrange(1, 100000)
temperature = randrange(-80, 135)
relhumidity = randrange(10, 60)
socket.send_string("%i %i %i" % (zipcode, temperature, relhumidity))
#
# should run for about 50 seconds.
#
use strict;
use warnings;
use v5.10;
use ZMQ::FFI::Constants qw(:all);
use FFI::TinyCC;
use FFI::Platypus::Declare;
use Carp::Always;
lib 'libzmq.so';
attach(
['zmq_bind' => 'zmqffi_bind']
=> ['pointer', 'string'] => 'int'
);
attach(
['zmq_ctx_new' => 'zmqffi_ctx_new']
=> [] => 'pointer'
);
attach(
['zmq_socket' => 'zmqffi_socket']
=> ['pointer', 'int'] => 'pointer'
);
attach(
['zmq_send' => 'zmqffi_send']
=> ['pointer', 'string', 'size_t', 'int'] => 'int',
);
attach(
['zmq_version' => 'zmqffi_version']
=> ['int*', 'int*', 'int*'] => 'void'
);
our $ffi_ctx = main::zmqffi_ctx_new();
die 'ffi ctx error' unless $ffi_ctx;
our $ffi_socket = main::zmqffi_socket($ffi_ctx, ZMQ_PUB);
die 'ffi socket error' unless $ffi_socket;
my $rv;
$rv = zmqffi_bind($ffi_socket, "ipc:///tmp/zmq-ffi-bench-$$");
die 'ffi bind error' if $rv == -1;
my ($major, $minor, $patch);
zmqffi_version(\$major, \$minor, \$patch);
say "FFI ZMQ Version: " . join(".", $major, $minor, $patch);
my $i;
while(1) {
$i++;
die if -1 == zmqffi_send($ffi_socket, 'ohhai', 5, 0);
exit if $i == 10_000_000;
}
#
# should run for about 50 seconds.
#
use strict;
use warnings;
use v5.10;
use ZMQ::LibZMQ3;
use ZMQ::FFI::Constants qw(:all);
my $rv;
my $xs_ctx = zmq_ctx_new();
die 'xs ctx error' unless $xs_ctx;
my $xs_socket = zmq_socket($xs_ctx, ZMQ_PUB);
die 'xs socket error' unless $xs_socket;
$rv = zmq_bind($xs_socket, "ipc:///tmp/zmq-xs-bench-$$");
die 'xs bind error' if $rv == -1;
my $i;
while(1) {
$i++;
die if -1 == zmq_send($xs_socket, 'ohhai', 5, 0);
exit if $i == 10_000_000;
}
#include <zmq.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <assert.h>
#include <string.h>
int main(void)
{
void *ctx = zmq_ctx_new();
assert(ctx);
void *socket = zmq_socket(ctx, ZMQ_PUB);
assert(socket);
pid_t p = getpid();
char *endpoint = malloc(256);
sprintf(endpoint, "ipc:///tmp/zmq-c-bench-%d", p);
assert( -1 != zmq_bind(socket, endpoint) );
int major, minor, patch;
zmq_version(&major, &minor, &patch);
printf("C ZMQ Version: %d.%d.%d\n", major, minor, patch);
for ( int i = 0; i < (10 * 1000 * 1000); i++ ) {
assert( -1 != zmq_send(socket, "ohhai", 5, 0) );
}
}
#
# Directly compare FFI::Platypus vs XS xsubs
#
use strict;
use warnings;
use v5.10;
use FFI::Platypus::Declare;
use ZMQ::LibZMQ3;
use ZMQ::FFI::Constants qw(:all);
use Benchmark qw(:all :hireswallclock);
use ExtUtils::Embed qw(ccopts);
use Inline;
use FFI::TinyCC;
lib 'libzmq.so';
attach(
['zmq_ctx_new' => 'zmqffi_ctx_new']
=> [] => 'pointer'
);
attach(
['zmq_socket' => 'zmqffi_socket']
=> ['pointer', 'int'] => 'pointer'
);
attach(
['zmq_bind' => 'zmqffi_bind']
=> ['pointer', 'string'] => 'int'
);
our $ffi_ctx = main::zmqffi_ctx_new();
die 'ffi ctx error' unless $ffi_ctx;
our $ffi_socket = main::zmqffi_socket($ffi_ctx, ZMQ_PUB);
die 'ffi socket error' unless $ffi_socket;
package FFIsock;
sub new {
return bless [], $_[0];
}
my $ffi = FFI::Platypus->new;
my $sockobj = FFIsock->new;
my $sockobj2 = FFIsock->new;
my $sockobj3 = FFIsock->new;
$ffi->lib('libzmq.so');
$ffi->attach_method([$ffi],
['zmq_send' => 'ffi']
=> ['pointer', 'string', 'size_t', 'int'] => 'int'
);
$ffi->attach_method(['FFIsock'], ['zmq_send' => 'ffi2']
=> ['pointer', 'string', 'size_t', 'int'] => 'int');
$ffi->attach_method([$sockobj=>$ffi_socket], ['zmq_send'=>'ffio'], ['pointer', 'string', 'size_t', 'int'] => 'int');
$ffi->attach_method([$sockobj2=>$ffi_socket], ['zmq_send'=>'ffio'], ['pointer', 'string', 'size_t', 'int'] => 'int');
$ffi->attach_method([$sockobj3=>$ffi_socket], ['zmq_send'=>'ffio'], ['pointer', 'string', 'size_t', 'int'] => 'int');
package main;
attach(
['zmq_send' => 'ffi2']
=> ['pointer', 'string', 'size_t', 'int'] => 'int',
);
attach(
['zmq_version' => 'zmqffi_version']
=> ['int*', 'int*', 'int*'] => 'void'
);
$ffi->attach_method([$sockobj=>$ffi_socket], ['zmq_send'=>'ffio'], ['pointer', 'string', 'size_t', 'int'] => 'int');
$ffi->attach_method([$sockobj2=>$ffi_socket], ['zmq_send'=>'ffio'], ['pointer', 'string', 'size_t', 'int'] => 'int');
$ffi->attach_method([$sockobj3=>$ffi_socket], ['zmq_send'=>'ffio'], ['pointer', 'string', 'size_t', 'int'] => 'int');
my $ffi_hash = { socket => $ffi_socket };
my $rv;
$rv = zmqffi_bind($ffi_socket, "ipc:///tmp/zmq-ffi-bench-$$");
die 'ffi bind error' if $rv == -1;
my $xs_ctx = zmq_ctx_new();
die 'xs ctx error' unless $xs_ctx;
my $xs_socket = zmq_socket($xs_ctx, ZMQ_PUB);
die 'xs socket error' unless $xs_socket;
$rv = zmq_bind($xs_socket, "ipc:///tmp/zmq-xs-bench-$$");
die 'xs bind error' if $rv == -1;
my ($major, $minor, $patch);
zmqffi_version(\$major, \$minor, \$patch);
say "FFI ZMQ Version: " . join(".", $major, $minor, $patch);
say "XS ZMQ Version: " . join(".", ZMQ::LibZMQ3::zmq_version());
use bytes;
use Inline C => qq{
typedef int (*send_t)(void *, const char *, long, int);
void loop_Inline(void *send, void *socket, const char *data, long size, int flags, void *die)
{
send_t s = send;
void (*d)(void) = die;
int i;
for(i=0; i<10*1000*1000; i++) {
if(s(socket, data, size, flags) == 1)
d();
}
}
}, cc=>'gcc';
my $tcc = FFI::TinyCC->new;
$tcc->compile_string(q{
void
loop(int (*f)(void *, const char *, long, int), void *arg0, const char *arg1, long arg2, int arg3, void (*die)(void))
{
int i;
for(i=0; i<10*1000*1000; i++)
if(f(arg0, arg1, arg2, arg3) == -1)
die();
}
});
my $address = $tcc->get_symbol('loop');
lib 'libzmq.so';
my $zmqsend = sub { FFI::Platypus::Declare::_ffi_object }->()->find_symbol('zmq_send');
type('(opaque, string, long, int)->int', 'f_closure');
type('()->void', 'die_closure');
attach([$address => 'loop'] => [qw(f_closure opaque string long int die_closure)] => 'void');
my $r3;
my $tcc2 = FFI::TinyCC->new;
$tcc2->detect_sysinclude_path;
# AUGH. ExtUtils::Embed prints rather than returns its strings based
# on whether it's run from perl -e or perl $file. That just bit me
# when I ran a test using perl -e.
$tcc2->set_options(ExtUtils::Embed::ccopts);
$tcc2->compile_string(q{/* DO NOT EDIT. AUTOGENERATED CODE. */
#define __builtin_expect(e,v) (e)
#define PERL_NO_GET_CONTEXT
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
XS(xsub)
{
dXSARGS; dVAR; dXSTARG;
if(items != 4)
croak("usage: blahblah");
if(!SvOK(ST(0)) || !SvOK(ST(1)) || !SvOK(ST(2)) || !SvOK(ST(3)))
croak("would have to fall back to fastcall.c");
XSprePUSH;
PUSHi(zmq_send(SvIV(ST(0)), SvPV_nolen(ST(1)), SvIV(ST(2)), SvIV(ST(3))));
XSRETURN(1);
}
void body(pTHX)
{
dVAR; dXSARGS; dXSTARG;
if((items != 4) || !SvOK(ST(1)) || !SvOK(ST(2)) || !SvOK(ST(3)))
croak("would have to fall back to fastcall.c");
XSprePUSH;
PUSHi(zmq_send(SvIV(ST(0)), SvPV_nolen(ST(1)), SvIV(ST(2)), SvIV(ST(3))));
XSRETURN(1);
}
void install_xsub(void)
{
dTHX;
newXS("main::xsub", xsub, "inline:1");
}
}) or die "couldn't compile string";
$tcc2->add_symbol('zmq_send', $ffi->find_symbol('zmq_send'));
my $tcc2_addr = $tcc2->get_symbol('install_xsub');
warn $tcc2->get_symbol('xsub');
$ffi->function($tcc2_addr, [] => 'void')->call();
use Data::Dumper;
use Scalar::Util qw(refaddr);
warn Dumper($ffi->_get_other_methods('FFIsock::ffio'));
$ffi->_get_other_methods('FFIsock::ffio')->{refaddr($sockobj2)}->{body} = $tcc2->get_symbol('body');
$ffi->_get_other_methods('FFIsock::ffio')->{refaddr($sockobj2)}->{argument} = $ffi_socket;
warn Dumper($ffi->_get_other_methods('FFIsock::ffio'));
Inline->bind(C => qq{
//#define zmq_send ((int (*)(void *, void *, unsigned long, int))${zmqsend}L)
extern int zmq_send(void *, void *, unsigned long, int);
#define PERL_NO_GET_CONTEXT
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
XS(xsub2)
{
dXSARGS; dVAR; dXSTARG;
if(items != 4)
croak("usage: blahblah");
if(!SvOK(ST(0)) || !SvOK(ST(1)) || !SvOK(ST(2)) || !SvOK(ST(3)))
croak("would have to fall back to fastcall.c");
XSprePUSH;
IV i = zmq_send(SvIV(ST(0)), SvPV_nolen(ST(1)), SvIV(ST(2)), SvIV(ST(3)));
PUSHi(i);
XSRETURN(1);
}
void body(pTHX)
{
dVAR; dXSARGS; dXSTARG;
if((items != 4) || !SvOK(ST(1)) || !SvOK(ST(2)) || !SvOK(ST(3)))
croak("would have to fall back to fastcall.c");
XSprePUSH;
PUSHi(zmq_send(SvIV(ST(0)), SvPV_nolen(ST(1)), SvIV(ST(2)), SvIV(ST(3))));
XSRETURN(1);
}
unsigned long get_body()
{
return (unsigned long)body;
}
void install_xsub2()
{
dTHX;
newXS("main::xsub2" , xsub2, "inline:1");
}
}, cc => 'gcc', ccflags => (ExtUtils::Embed::ccopts . " -O6 -march=native -mtune=native -lzmq3"), libs=>'-lzmq3 -lzmq');
install_xsub2();
warn Dumper($ffi->_get_other_methods('FFIsock::ffio'));
$ffi->_get_other_methods('FFIsock::ffio')->{refaddr($sockobj3)}->{body} = get_body();
warn Dumper($ffi->_get_other_methods('FFIsock::ffio'));
my $r = {};
my $method0 = $sockobj->can('ffio');
my $method1 = $sockobj2->can('ffio');
my $method2 = $sockobj3->can('ffio');
sleep(1);
my $count = $ARGV[0] eq '--test' ? 1 : 1000;
while($count--)
{
$r3 = timethese 1, {
TinyCC => sub {
my $die_closure = closure { die "zmq_send error"};
loop($zmqsend, $ffi_socket, 'ohhai', 5, 0, $die_closure);
},
'Inline(GCC)' => sub {
my $die_closure = closure { die "zmq_send error" };
loop_Inline($zmqsend, $ffi_socket, 'ohhai', 5, 0, $die_closure);
},
Python => sub {
# this is a little unfair, since there's overhead for starting
# python and waiting for it, but that's on the order of a tenth of
# a second ...
system("python ./zmq-bench.py");
},
'Perl exec, XS based' => sub {
system("perl ./zmq-bench-xsexec.pl");
},
'Perl exec, FFI based' => sub {
system("perl ./zmq-bench-ffiexec.pl");
},
};
my $new_r = timethese 10_000_000, {
# 'class method' => sub {
# die 'ffi send error' if -1 == FFIsock->ffi2($ffi_socket, 'ohhai', 5, 0);
# },
# 'class method(hash)' => sub {
# die 'ffi send error' if -1 == FFIsock->ffi2($ffi_hash->{socket}, 'ohhai', 5, 0);
# },
'method' => sub {
die 'ffi send error' if -1 == $sockobj->ffio('ohhai', 5, 0);
},
'method (2)' => sub {
die 'ffi send error' if -1 == FFIsock::ffio($sockobj, 'ohhai', 5, 0);
},
'method (3)' => sub {
die 'ffi send error' if -1 == $sockobj->$method0('ohhai', 5, 0);
},
'TinyCC method' => sub {
die 'ffi send error' if -1 == $sockobj2->ffio('ohhai', 5, 0);
},
'Inline(GCC) method' => sub {
die 'ffi send error' if -1 == $sockobj3->ffio('ohhai', 5, 0);
},
'TinyCC method (2)' => sub {
die 'ffi send error' if -1 == FFIsock::ffio($sockobj2, 'ohhai', 5, 0);
},
'Inline(GCC) method (2)' => sub {
die 'ffi send error' if -1 == FFIsock::ffio($sockobj3, 'ohhai', 5, 0);
},
'TinyCC method (3)' => sub {
die 'ffi send error' if -1 == $sockobj2->$method1('ohhai', 5, 0);
},
'Inline(GCC) method (3)' => sub {
die 'ffi send error' if -1 == $sockobj3->$method2('ohhai', 5, 0);
},
'Inline(GCC) xsub' => sub {
die 'ffi send error' if -1 == xsub2($ffi_socket, 'ohhai', 5, 0);
},
'TinyCC xsub' => sub {
die 'ffi send error' if -1 == xsub($ffi_socket, 'ohhai', 5, 0);
},
'xsub' => sub {
die 'ffi send error' if -1 == ffi2($ffi_socket, 'ohhai', 5, 0);
},
'xsub(hash)' => sub {
die 'ffi send error' if -1 == ffi2($ffi_hash->{socket}, 'ohhai', 5, 0);
},
'XS' => sub {
die 'xs send error ' if -1 == zmq_send($xs_socket, 'ohhai', 5, 0);
},
};
for my $key (keys %$new_r)
{
if(!defined $r->{$key} or $new_r->{$key}->cpu_a < $r->{$key}->cpu_a) {
$r->{$key} = $new_r->{$key};
}
}
for my $key (keys %$r3)
{
if(!defined $r->{$key} or $r3->{$key}->cpu_a < $r->{$key}->cpu_a) {
$r->{$key} = $r3->{$key};
# HACK! we're accessing the Benchmark object's internal struct
$r->{$key}->[5] = 10_000_000;
}
}
cmpthese($r);
}
#
# Weather update client
# Connects SUB socket to tcp://localhost:5556
# Collects weather updates and finds avg temp in zipcode
#
import sys
import zmq
# Socket to talk to server
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind('ipc:///tmp/zmq-py-bench')
buf = bytes('ohhai')
i=0
while True:
i+=1
socket.send(buf, 0)
if i == 10*1000*1000:
exit(0)
use strict;
use warnings;
use v5.10;
use ZMQ::FFI;
use ZMQ::FFI::Constants qw(ZMQ_SUB);
say "Collecting updates from weather station...";
my $context = ZMQ::FFI->new();
my $subscriber = $context->socket(ZMQ_SUB);
$subscriber->connect("tcp://localhost:5556");
my $filter = $ARGV[0] // "10001 ";
$subscriber->subscribe($filter);
my $update_nbr = 100;
my $total_temp = 0;
my ($string, $zipcode, $temperature, $relhumidity);
for (1..$update_nbr) {
$string = $subscriber->recv();
($zipcode, $temperature, $relhumidity) = split ' ', $string;
$total_temp += $temperature;
}
printf "Average temperature for zipcode '%s' was %dF\n",
$filter, int($total_temp / $update_nbr);
use strict;
use warnings;
use ZMQ::FFI;
use ZMQ::FFI::Constants qw(ZMQ_PUB);
my $context = ZMQ::FFI->new();
my $publisher = $context->socket(ZMQ_PUB);
$publisher->bind("tcp://*:5556");
my ($zipcode, $temperature, $relhumidity, $update);
# for (1..1_000_000) { # publish constant number when profiling
while (1) {
$zipcode = rand(100_000);
$temperature = rand(215) - 80;
$relhumidity = rand(50) + 10;
$update = sprintf(
'%05d %d %d',
$zipcode,$temperature,$relhumidity
);
$publisher->send($update);
}
use strict;
use warnings;
use v5.10;
use ZMQ::LibZMQ3;
use ZMQ::Constants qw(ZMQ_SUB ZMQ_SUBSCRIBE);
use zhelpers;
say 'Collecting updates from weather server...';
my $context = zmq_init();
my $subscriber = zmq_socket($context, ZMQ_SUB);
zmq_connect($subscriber, 'tcp://localhost:5556');
my $filter = @ARGV ? $ARGV[0] : '10001 ';
zmq_setsockopt($subscriber, ZMQ_SUBSCRIBE, $filter);
my $update_nbr = 100;
my $total_temp = 0;
my ($string, $zipcode, $temperature, $relhumidity);
for (1 .. $update_nbr) {
$string = s_recv($subscriber);
($zipcode, $temperature, $relhumidity) = split ' ', $string;
$total_temp += $temperature;
}
printf "Average temperature for zipcode '%s' was %dF\n",
$filter, int($total_temp / $update_nbr);
use strict;
use warnings;
use ZMQ::LibZMQ3;
use ZMQ::Constants qw(ZMQ_PUB);
use zhelpers;
my $context = zmq_init();
my $publisher = zmq_socket($context, ZMQ_PUB);
zmq_bind($publisher, 'tcp://*:5556');
my ($zipcode, $temperature, $relhumidity, $update);
# for (1..1_000_000) { # publish constant number when profiling
while (1) {
$zipcode = rand(100_000);
$temperature = rand(215) - 80;
$relhumidity = rand(50) + 10;
$update = sprintf(
'%05d %d %d',
$zipcode,$temperature,$relhumidity
);
s_send($publisher, $update);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment