Last active
November 25, 2021 11:33
-
-
Save bitonic/2d09df858ba2233b7f472f5f8c0512b4 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
Benchmark for Eigen change, see <https://gitlab.com/libeigen/eigen/-/merge_requests/734#note_743674873> | |
Comment reported here for posterity: | |
Here's a synthetic "benchmark" which I _believe_ shows the difference: https://gist.github.com/bitonic/2d09df858ba2233b7f472f5f8c0512b4 . | |
I say that I believe that it exhibits the difference because it shows the runtime differences that I'd expect, with some caveats (see comments on number of instructions below). | |
However, I have not inspected the assembly manually to check that the code varies in the way I'd expect, which would be a requirement to ensure that things change in the way we expect. That is a bit more labor intensive, and while I might do it, I don't have time to do it right now. | |
The code inline: | |
```cpp | |
#include <Eigen/Core> | |
#include <iostream> | |
using ArrType = Eigen::Array<float, 2, 169>; | |
__attribute__((noinline)) | |
static void print_array(const char* name, const ArrType& arr) { | |
std::cout << name << ": " << arr << std::endl; | |
} | |
__attribute__((noinline)) | |
static void test_packet(float x) { | |
ArrType xs(x); | |
ArrType ys(0.0f); | |
print_array("xs", xs); | |
print_array("ys", ys); | |
for (size_t i = 0; i < 100000000; i++) { | |
if (i % 2 == 0) { | |
ys += xs; | |
} else { | |
ys -= xs; | |
} | |
} | |
print_array("ys", ys); | |
return; | |
} | |
int main() { | |
test_packet(5.0f); | |
return 0; | |
} | |
``` | |
I compile it with | |
``` | |
% clang++ -std=c++20 -I. -Wall -Werror -mavx2 -O3 test-avx2.cpp -o test-avx2 | |
``` | |
In the `eigen` repo. | |
We just add and subtract from an array which is 169 elements wide. What I realized is that this change only affects arrays of static size -- which was the case in the proprietary code this perf improvement came up in. In fact I am using the same size I were using in that code -- 169. We might want to extend it to `Dynamic` (see the stop condition on the same line on why it does not work with `Dynamic`. | |
If we compile with the improvement, this is perf stat: | |
``` | |
Performance counter stats for './test-avx2-new': | |
2,526.48 msec task-clock # 1.000 CPUs utilized | |
5 context-switches # 0.002 K/sec | |
2 cpu-migrations # 0.001 K/sec | |
89 page-faults # 0.035 K/sec | |
9,079,877,582 cycles # 3.594 GHz | |
12,968,042,398 instructions # 1.43 insn per cycle | |
203,464,968 branches # 80.533 M/sec | |
91,320 branch-misses # 0.04% of all branches | |
2.527443809 seconds time elapsed | |
2.525190000 seconds user | |
0.001999000 seconds sys | |
``` | |
Numbers of note: 2.5 seconds runtime, 12B instructions. With the old code: | |
``` | |
3,704.16 msec task-clock # 0.999 CPUs utilized | |
8 context-switches # 0.002 K/sec | |
0 cpu-migrations # 0.000 K/sec | |
86 page-faults # 0.023 K/sec | |
13,027,668,290 cycles # 3.517 GHz | |
38,871,811,483 instructions # 2.98 insn per cycle | |
2,904,199,848 branches # 784.037 M/sec | |
139,444 branch-misses # 0.00% of all branches | |
3.706167382 seconds time elapsed | |
3.702763000 seconds user | |
0.002999000 seconds sys | |
``` | |
3.7 seconds runtime (1.5x speedup), 40B instructions. I actually do not have a great explanation for the 3x jump in instruction, I was expecting a 2x jump, roughly. | |
Again, I've learnt to not make definitive statements when it comes to micro benchmarks unless I have checked the assembly, but I think the above already gives some confidence that the code does what I think it does. | |
*/ | |
#include <Eigen/Core> | |
#include <iostream> | |
using ArrType = Eigen::Array<float, 2, 169>; | |
__attribute__((noinline)) | |
static void print_array(const char* name, const ArrType& arr) { | |
std::cout << name << ": " << arr << std::endl; | |
} | |
__attribute__((noinline)) | |
static void test_packet(float x) { | |
ArrType xs(x); | |
ArrType ys(0.0f); | |
print_array("xs", xs); | |
print_array("ys", ys); | |
for (size_t i = 0; i < 100000000; i++) { | |
if (i % 2 == 0) { | |
ys += xs; | |
} else { | |
ys -= xs; | |
} | |
} | |
print_array("ys", ys); | |
return; | |
} | |
int main() { | |
test_packet(5.0f); | |
return 0; | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment