Skip to content

Instantly share code, notes, and snippets.

@folkertdev
Last active October 23, 2024 07:54
Show Gist options
  • Save folkertdev/977183fb706b7693863bd7f358578292 to your computer and use it in GitHub Desktop.
Save folkertdev/977183fb706b7693863bd7f358578292 to your computer and use it in GitHub Desktop.
zlib-rs labeled match benchmarks

zlib-rs labeled match benchmarks

build the toolchain

A proof of concept implementation can be found at https://github.com/trifectatechfoundation/rust/tree/labeled-match. Build it with ./x build, and then set up the toolchain. Now cargo +stage1 build should use a compiler with labeled-match available.

run the benchmark

git clone https://github.com/trifectatechfoundation/zlib-rs.git
git checkout len-as-match
sh replicate-labeled-match-benchmarks.sh

this runs 4 benchmarks

  • baseline: the current zlib-rs main branch approach using tail calls
  • loop-plus-match: standard approach using a loop and match; suffers from branch misprediction
  • labeled-match-len: the len function and friends now use labeled match
  • labeled-match-fast: the len and friends, and inflate_fast_help functions now use labeld match

The benchmark is run for various chunk sizes (2 to the power 4, 7 and 16), which varies what logic is run: a chunk size of 2^16 spends most time in an inner loop, while 2^4 spends much more time in the state machine logic.

results

Mostly what we see is that labeled match gives significant speedups for small chunk sizes. For larger chunk sizes, the results are less clear. I believe really the result is net-zero, but we need clearly need to perform some further tuning.

Note in particular how loop-plus-match works well for small and big inputs, but terribly for medium inputs. The labeled-match-fast change barely seems to do anything versus just labeled-match-len.

Benchmark 1 (69 runs): /tmp/uncompress-baseline rs-chunked 4 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          72.7ms ± 4.55ms    71.1ms …  108ms          3 ( 4%)        0%
  peak_rss           24.1MB ± 56.3KB    23.9MB … 24.1MB         12 (17%)        0%
  cpu_cycles          294M  ± 17.7M      290M  …  434M           6 ( 9%)        0%
  instructions        914M  ±  449       914M  …  914M           1 ( 1%)        0%
  cache_references   2.99M  ±  407K     2.68M  … 6.10M           4 ( 6%)        0%
  cache_misses        134K  ± 23.8K      101K  …  302K           2 ( 3%)        0%
  branch_misses      4.09M  ± 8.56K     4.08M  … 4.14M           2 ( 3%)        0%
Benchmark 2 (71 runs): /tmp/loop-plus-match rs-chunked 4 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          70.6ms ±  499us    69.9ms … 72.9ms          3 ( 4%)        ⚡-  2.8% ±  1.5%
  peak_rss           24.1MB ± 60.2KB    24.0MB … 24.1MB          0 ( 0%)          -  0.1% ±  0.1%
  cpu_cycles          287M  ± 1.53M      285M  …  294M           5 ( 7%)        ⚡-  2.7% ±  1.4%
  instructions        792M  ±  336       792M  …  792M           0 ( 0%)        ⚡- 13.4% ±  0.0%
  cache_references   2.92M  ± 81.8K     2.77M  … 3.12M           0 ( 0%)          -  2.2% ±  3.2%
  cache_misses        113K  ± 12.4K     89.9K  …  169K           3 ( 4%)        ⚡- 16.0% ±  4.7%
  branch_misses      4.10M  ± 4.76K     4.09M  … 4.13M           3 ( 4%)          +  0.2% ±  0.1%
Benchmark 3 (80 runs): /tmp/labeled-match-len rs-chunked 4 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          62.6ms ±  555us    61.7ms … 66.1ms          2 ( 3%)        ⚡- 14.0% ±  1.4%
  peak_rss           24.1MB ± 77.9KB    23.9MB … 24.1MB          0 ( 0%)          -  0.1% ±  0.1%
  cpu_cycles          249M  ± 1.87M      248M  …  263M           5 ( 6%)        ⚡- 15.4% ±  1.3%
  instructions        686M  ±  267       686M  …  686M           0 ( 0%)        ⚡- 24.9% ±  0.0%
  cache_references   3.01M  ±  480K     2.76M  … 7.16M           2 ( 3%)          +  0.5% ±  4.8%
  cache_misses        123K  ± 6.92K      100K  …  137K           2 ( 3%)        ⚡-  8.4% ±  4.1%
  branch_misses      4.08M  ± 2.78K     4.08M  … 4.09M           4 ( 5%)          -  0.1% ±  0.0%
Benchmark 4 (81 runs): /tmp/labeled-match-fast rs-chunked 4 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          61.9ms ±  468us    61.3ms … 64.0ms          6 ( 7%)        ⚡- 14.8% ±  1.4%
  peak_rss           24.1MB ± 56.9KB    24.0MB … 24.1MB         20 (25%)          -  0.0% ±  0.1%
  cpu_cycles          246M  ± 1.53M      245M  …  255M          12 (15%)        ⚡- 16.4% ±  1.3%
  instructions        689M  ±  365       689M  …  689M           1 ( 1%)        ⚡- 24.6% ±  0.0%
  cache_references   2.97M  ±  211K     2.79M  … 4.37M           3 ( 4%)          -  0.8% ±  3.4%
  cache_misses       91.4K  ± 9.22K     72.0K  …  114K           0 ( 0%)        ⚡- 31.8% ±  4.2%
  branch_misses      4.08M  ± 4.06K     4.08M  … 4.10M           1 ( 1%)          -  0.1% ±  0.1%
Benchmark 1 (108 runs): /tmp/uncompress-baseline rs-chunked 7 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          46.3ms ± 1.37ms    45.0ms … 56.5ms         10 ( 9%)        0%
  peak_rss           24.1MB ± 56.3KB    24.0MB … 24.1MB         26 (24%)        0%
  cpu_cycles          174M  ± 4.67M      173M  …  214M          10 ( 9%)        0%
  instructions        516M  ±  443       516M  …  516M           3 ( 3%)        0%
  cache_references   3.16M  ±  219K     2.88M  … 4.43M           6 ( 6%)        0%
  cache_misses       83.6K  ± 11.6K     62.4K  …  161K           5 ( 5%)        0%
  branch_misses      2.00M  ± 5.17K     2.00M  … 2.04M           9 ( 8%)        0%
Benchmark 2 (78 runs): /tmp/loop-plus-match rs-chunked 7 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          64.5ms ± 1.29ms    63.6ms … 71.3ms          6 ( 8%)        💩+ 39.3% ±  0.8%
  peak_rss           24.1MB ± 72.6KB    23.9MB … 24.1MB          0 ( 0%)          -  0.1% ±  0.1%
  cpu_cycles          257M  ± 3.65M      255M  …  279M          12 (15%)        💩+ 47.4% ±  0.7%
  instructions        720M  ±  384       720M  …  720M           1 ( 1%)        💩+ 39.7% ±  0.0%
  cache_references   3.21M  ±  182K     2.97M  … 4.32M           3 ( 4%)          +  1.8% ±  1.9%
  cache_misses       57.9K  ± 8.34K     47.0K  … 98.8K           4 ( 5%)        ⚡- 30.7% ±  3.6%
  branch_misses      2.00M  ± 2.44K     2.00M  … 2.01M           5 ( 6%)          -  0.2% ±  0.1%
Benchmark 3 (111 runs): /tmp/labeled-match-len rs-chunked 7 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          45.2ms ±  781us    44.2ms … 51.3ms          5 ( 5%)        ⚡-  2.5% ±  0.6%
  peak_rss           24.1MB ± 66.5KB    23.9MB … 24.1MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles          170M  ± 3.34M      168M  …  199M           9 ( 8%)        ⚡-  2.8% ±  0.6%
  instructions        510M  ±  359       510M  …  510M           1 ( 1%)        ⚡-  1.0% ±  0.0%
  cache_references   3.21M  ±  178K     2.97M  … 4.55M           6 ( 5%)          +  1.7% ±  1.7%
  cache_misses       31.2K  ± 4.33K     25.0K  … 52.0K           6 ( 5%)        ⚡- 62.7% ±  2.8%
  branch_misses      1.99M  ± 1.35K     1.99M  … 2.00M           5 ( 5%)          -  0.4% ±  0.0%
Benchmark 4 (111 runs): /tmp/labeled-match-fast rs-chunked 7 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          45.1ms ±  454us    44.3ms … 46.8ms          5 ( 5%)        ⚡-  2.5% ±  0.6%
  peak_rss           24.1MB ± 73.0KB    23.9MB … 24.1MB          0 ( 0%)          -  0.1% ±  0.1%
  cpu_cycles          169M  ± 1.27M      168M  …  176M          10 ( 9%)        ⚡-  3.0% ±  0.5%
  instructions        515M  ±  295       515M  …  515M           0 ( 0%)          -  0.1% ±  0.0%
  cache_references   3.20M  ± 81.1K     2.97M  … 3.38M           0 ( 0%)          +  1.5% ±  1.4%
  cache_misses       47.5K  ± 8.09K     36.6K  … 72.1K           1 ( 1%)        ⚡- 43.2% ±  3.2%
  branch_misses      2.00M  ± 1.86K     1.99M  … 2.00M           2 ( 2%)          -  0.2% ±  0.1%
Benchmark 1 (182 runs): /tmp/uncompress-baseline rs-chunked 16 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          27.5ms ±  474us    26.7ms … 31.4ms          7 ( 4%)        0%
  peak_rss           24.1MB ± 48.1KB    24.0MB … 24.1MB         29 (16%)        0%
  cpu_cycles         90.0M  ± 1.27M     89.4M  …  102M          17 ( 9%)        0%
  instructions        239M  ±  253       239M  …  239M           3 ( 2%)        0%
  cache_references   2.28M  ± 57.3K     2.20M  … 2.78M           5 ( 3%)        0%
  cache_misses       48.4K  ± 2.85K     43.3K  … 68.4K           5 ( 3%)        0%
  branch_misses      1.05M  ± 1.62K     1.05M  … 1.06M           2 ( 1%)        0%
Benchmark 2 (186 runs): /tmp/loop-plus-match rs-chunked 16 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          26.9ms ±  335us    26.3ms … 29.5ms          4 ( 2%)        ⚡-  2.4% ±  0.3%
  peak_rss           24.1MB ± 63.3KB    23.9MB … 24.1MB          0 ( 0%)          -  0.1% ±  0.0%
  cpu_cycles         87.2M  ±  732K     86.8M  … 94.7M          20 (11%)        ⚡-  3.1% ±  0.2%
  instructions        248M  ±  262       248M  …  248M           0 ( 0%)        💩+  3.8% ±  0.0%
  cache_references   2.26M  ± 85.8K     2.18M  … 2.97M           5 ( 3%)          -  0.8% ±  0.7%
  cache_misses       52.0K  ± 2.13K     47.6K  … 67.0K           5 ( 3%)        💩+  7.4% ±  1.1%
  branch_misses      1.05M  ± 1.58K     1.04M  … 1.05M           2 ( 1%)          -  0.5% ±  0.0%
Benchmark 3 (182 runs): /tmp/labeled-match-len rs-chunked 16 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          27.4ms ±  407us    26.7ms … 31.6ms          3 ( 2%)          -  0.4% ±  0.3%
  peak_rss           24.1MB ± 64.3KB    23.9MB … 24.1MB          0 ( 0%)          -  0.1% ±  0.0%
  cpu_cycles         89.3M  ±  985K     88.8M  …  101M          13 ( 7%)          -  0.8% ±  0.3%
  instructions        254M  ±  326       254M  …  254M           2 ( 1%)        💩+  6.1% ±  0.0%
  cache_references   2.26M  ± 68.6K     2.18M  … 2.75M           3 ( 2%)          -  0.9% ±  0.6%
  cache_misses       55.1K  ± 2.58K     50.5K  … 68.0K           4 ( 2%)        💩+ 13.8% ±  1.2%
  branch_misses      1.05M  ± 2.09K     1.04M  … 1.05M           0 ( 0%)          -  0.4% ±  0.0%
Benchmark 4 (182 runs): /tmp/labeled-match-fast rs-chunked 16 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          27.5ms ±  703us    26.8ms … 33.7ms          6 ( 3%)          +  0.0% ±  0.4%
  peak_rss           24.1MB ± 62.8KB    23.9MB … 24.1MB          0 ( 0%)          -  0.1% ±  0.0%
  cpu_cycles         89.8M  ± 1.93M     89.1M  …  108M          20 (11%)          -  0.3% ±  0.4%
  instructions        253M  ±  257       253M  …  253M           1 ( 1%)        💩+  5.7% ±  0.0%
  cache_references   2.26M  ±  101K     2.18M  … 3.05M           7 ( 4%)          -  1.1% ±  0.7%
  cache_misses       50.2K  ± 2.61K     45.6K  … 66.8K           5 ( 3%)        💩+  3.8% ±  1.2%
  branch_misses      1.05M  ± 1.14K     1.05M  … 1.06M           2 ( 1%)          -  0.2% ±  0.0%
@bjorn3
Copy link

bjorn3 commented Oct 23, 2024

On a AMD Ryzen 7 3700X 8-Core Processor:

Benchmark 1 (61 runs): /tmp/uncompress-baseline rs-chunked 4 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          82.0ms ± 1.94ms    79.6ms … 88.8ms          2 ( 3%)        0%
  peak_rss           23.2MB ± 1.22MB    20.5MB … 25.7MB          0 ( 0%)        0%
  cpu_cycles          336M  ± 7.88M      330M  …  365M           2 ( 3%)        0%
  instructions        916M  ±  245       916M  …  916M           1 ( 2%)        0%
  cache_references   3.02M  ±  333K     2.71M  … 4.11M           3 ( 5%)        0%
  cache_misses       45.8K  ± 7.34K     39.6K  … 94.0K           1 ( 2%)        0%
  branch_misses      4.01M  ± 22.2K     3.98M  … 4.06M           0 ( 0%)        0%
Benchmark 2 (61 runs): /tmp/loop-plus-match rs-chunked 4 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          82.7ms ±  642us    81.5ms … 85.6ms          3 ( 5%)          +  0.8% ±  0.6%
  peak_rss           23.4MB ± 1.30MB    20.2MB … 25.7MB          0 ( 0%)          +  0.5% ±  1.9%
  cpu_cycles          340M  ± 2.40M      338M  …  351M           2 ( 3%)          +  1.1% ±  0.6%
  instructions        794M  ±  276       794M  …  794M           0 ( 0%)        ⚡- 13.4% ±  0.0%
  cache_references   2.95M  ±  189K     2.78M  … 3.93M           6 (10%)          -  2.5% ±  3.2%
  cache_misses       41.8K  ± 4.61K     36.9K  … 72.3K           3 ( 5%)        ⚡-  8.8% ±  4.8%
  branch_misses      4.07M  ± 18.3K     4.02M  … 4.11M           5 ( 8%)        💩+  1.5% ±  0.2%
Benchmark 3 (70 runs): /tmp/labeled-match-len rs-chunked 4 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          72.0ms ±  800us    70.9ms … 75.1ms          6 ( 9%)        ⚡- 12.3% ±  0.6%
  peak_rss           23.4MB ± 1.48MB    20.6MB … 25.7MB          0 ( 0%)          +  0.7% ±  2.0%
  cpu_cycles          292M  ± 2.71M      290M  …  303M           6 ( 9%)        ⚡- 13.1% ±  0.6%
  instructions        685M  ±  286       685M  …  685M           0 ( 0%)        ⚡- 25.3% ±  0.0%
  cache_references   2.89M  ±  139K     2.63M  … 3.43M          10 (14%)        ⚡-  4.4% ±  2.8%
  cache_misses       46.2K  ± 5.53K     40.2K  … 78.5K           3 ( 4%)          +  0.9% ±  4.8%
  branch_misses      3.99M  ± 37.6K     3.92M  … 4.05M           0 ( 0%)          -  0.4% ±  0.3%
Benchmark 4 (70 runs): /tmp/labeled-match-fast rs-chunked 4 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          72.1ms ± 2.08ms    70.8ms … 86.8ms          4 ( 6%)        ⚡- 12.1% ±  0.8%
  peak_rss           23.3MB ± 1.28MB    20.7MB … 25.7MB          0 ( 0%)          +  0.1% ±  1.8%
  cpu_cycles          292M  ± 8.11M      289M  …  351M           7 (10%)        ⚡- 13.2% ±  0.8%
  instructions        685M  ±  291       685M  …  685M           0 ( 0%)        ⚡- 25.2% ±  0.0%
  cache_references   2.94M  ±  374K     2.72M  … 5.02M           5 ( 7%)          -  2.7% ±  4.0%
  cache_misses       45.9K  ± 5.04K     38.7K  … 64.7K           3 ( 4%)          +  0.1% ±  4.7%
  branch_misses      3.95M  ± 28.3K     3.92M  … 4.04M           0 ( 0%)        ⚡-  1.5% ±  0.2%
Benchmark 1 (105 runs): /tmp/uncompress-baseline rs-chunked 7 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          47.7ms ± 1.05ms    46.3ms … 51.8ms          7 ( 7%)        0%
  peak_rss           23.2MB ± 1.26MB    20.6MB … 25.7MB          0 ( 0%)        0%
  cpu_cycles          188M  ± 3.25M      186M  …  200M          12 (11%)        0%
  instructions        516M  ±  248       516M  …  516M           1 ( 1%)        0%
  cache_references   3.63M  ±  507K     3.26M  … 7.95M          13 (12%)        0%
  cache_misses       48.0K  ± 13.7K     36.9K  …  126K          10 (10%)        0%
  branch_misses      1.92M  ± 7.37K     1.91M  … 1.94M           0 ( 0%)        0%
Benchmark 2 (65 runs): /tmp/loop-plus-match rs-chunked 7 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          77.5ms ± 2.17ms    75.6ms … 89.1ms          7 (11%)        💩+ 62.4% ±  1.0%
  peak_rss           23.3MB ± 1.24MB    20.6MB … 26.0MB          0 ( 0%)          +  0.3% ±  1.7%
  cpu_cycles          312M  ± 6.06M      308M  …  338M           6 ( 9%)        💩+ 66.0% ±  0.7%
  instructions        721M  ±  304       721M  …  721M           0 ( 0%)        💩+ 39.6% ±  0.0%
  cache_references   3.94M  ±  683K     3.39M  … 7.85M           6 ( 9%)        💩+  8.5% ±  4.9%
  cache_misses       52.3K  ± 26.5K     37.3K  …  226K           5 ( 8%)          +  8.8% ± 12.6%
  branch_misses      1.93M  ± 7.55K     1.92M  … 1.96M           1 ( 2%)          +  0.5% ±  0.1%
Benchmark 3 (105 runs): /tmp/labeled-match-len rs-chunked 7 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          47.9ms ± 2.74ms    45.6ms … 57.4ms         18 (17%)          +  0.2% ±  1.2%
  peak_rss           23.3MB ± 1.16MB    20.6MB … 26.0MB          0 ( 0%)          +  0.3% ±  1.4%
  cpu_cycles          185M  ± 3.36M      183M  …  208M          17 (16%)        ⚡-  1.7% ±  0.5%
  instructions        498M  ±  373       498M  …  498M           0 ( 0%)        ⚡-  3.5% ±  0.0%
  cache_references   3.80M  ± 1.76M     3.28M  … 21.5M           9 ( 9%)          +  4.6% ±  9.7%
  cache_misses       43.8K  ± 5.01K     38.6K  … 65.7K           8 ( 8%)        ⚡-  8.7% ±  5.8%
  branch_misses      1.92M  ± 7.79K     1.91M  … 1.94M           0 ( 0%)          -  0.2% ±  0.1%
Benchmark 4 (106 runs): /tmp/labeled-match-fast rs-chunked 7 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          47.5ms ± 2.92ms    45.5ms … 62.6ms         17 (16%)          -  0.6% ±  1.2%
  peak_rss           23.1MB ± 1.36MB    20.5MB … 26.0MB          0 ( 0%)          -  0.4% ±  1.5%
  cpu_cycles          184M  ± 6.05M      181M  …  220M          17 (16%)        ⚡-  1.8% ±  0.7%
  instructions        500M  ±  330       500M  …  500M           2 ( 2%)        ⚡-  3.2% ±  0.0%
  cache_references   3.78M  ±  608K     3.30M  … 6.82M          14 (13%)          +  4.2% ±  4.2%
  cache_misses       45.8K  ± 15.3K     36.3K  …  142K           9 ( 8%)          -  4.7% ±  8.2%
  branch_misses      1.92M  ± 8.42K     1.91M  … 1.94M           0 ( 0%)          -  0.3% ±  0.1%
Benchmark 1 (197 runs): /tmp/uncompress-baseline rs-chunked 16 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          25.4ms ± 1.36ms    24.1ms … 35.7ms         22 (11%)        0%
  peak_rss           23.3MB ± 1.18MB    20.2MB … 25.8MB          0 ( 0%)        0%
  cpu_cycles         93.2M  ± 4.00M     91.1M  …  114M          32 (16%)        0%
  instructions        239M  ±  295       239M  …  239M           0 ( 0%)        0%
  cache_references   2.69M  ±  381K     2.37M  … 4.96M          26 (13%)        0%
  cache_misses       59.9K  ± 6.53K     54.5K  …  109K           9 ( 5%)        0%
  branch_misses      1.04M  ± 5.14K     1.03M  … 1.05M           0 ( 0%)        0%
Benchmark 2 (201 runs): /tmp/loop-plus-match rs-chunked 16 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          24.9ms ±  523us    24.1ms … 27.7ms          8 ( 4%)        ⚡-  2.1% ±  0.8%
  peak_rss           23.2MB ± 1.30MB    20.2MB … 25.9MB          0 ( 0%)          -  0.8% ±  1.0%
  cpu_cycles         91.9M  ± 1.54M     91.0M  …  102M          22 (11%)          -  1.4% ±  0.6%
  instructions        249M  ±  274       249M  …  249M           0 ( 0%)        💩+  3.8% ±  0.0%
  cache_references   2.58M  ±  189K     2.42M  … 3.83M          13 ( 6%)        ⚡-  4.0% ±  2.2%
  cache_misses       59.2K  ± 7.22K     53.6K  …  107K          11 ( 5%)          -  1.1% ±  2.3%
  branch_misses      1.04M  ± 4.63K     1.03M  … 1.05M           0 ( 0%)          -  0.3% ±  0.1%
Benchmark 3 (200 runs): /tmp/labeled-match-len rs-chunked 16 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          24.8ms ±  329us    24.1ms … 27.2ms          1 ( 1%)        ⚡-  2.4% ±  0.8%
  peak_rss           23.4MB ± 1.32MB    20.2MB … 26.0MB          0 ( 0%)          +  0.1% ±  1.1%
  cpu_cycles         92.0M  ±  566K     91.5M  … 99.6M           4 ( 2%)          -  1.4% ±  0.6%
  instructions        247M  ±  265       247M  …  247M           0 ( 0%)        💩+  3.1% ±  0.0%
  cache_references   2.54M  ± 64.5K     2.40M  … 3.18M           3 ( 2%)        ⚡-  5.6% ±  2.0%
  cache_misses       59.4K  ± 2.22K     54.9K  … 68.4K           4 ( 2%)          -  0.9% ±  1.6%
  branch_misses      1.04M  ± 3.79K     1.03M  … 1.04M           0 ( 0%)          -  0.3% ±  0.1%
Benchmark 4 (200 runs): /tmp/labeled-match-fast rs-chunked 16 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          25.0ms ±  488us    24.2ms … 28.4ms          6 ( 3%)        ⚡-  1.8% ±  0.8%
  peak_rss           23.2MB ± 1.15MB    20.7MB … 25.8MB          0 ( 0%)          -  0.7% ±  1.0%
  cpu_cycles         92.3M  ± 1.38M     91.6M  …  104M          19 (10%)          -  1.0% ±  0.6%
  instructions        247M  ±  260       247M  …  247M           1 ( 1%)        💩+  3.1% ±  0.0%
  cache_references   2.56M  ±  109K     2.37M  … 3.30M          11 ( 6%)        ⚡-  4.6% ±  2.0%
  cache_misses       58.4K  ± 1.83K     53.8K  … 68.0K           4 ( 2%)          -  2.4% ±  1.6%
  branch_misses      1.04M  ± 2.83K     1.03M  … 1.05M           0 ( 0%)          -  0.2% ±  0.1%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment