Skip to content

Instantly share code, notes, and snippets.

@kellabyte
Last active January 4, 2016 00:49
Show Gist options
  • Save kellabyte/8543990 to your computer and use it in GitHub Desktop.
Save kellabyte/8543990 to your computer and use it in GitHub Desktop.

I'm running some SSE and AVX instructions on Harpertown and Sandy Bridge systems and I noticed the Sandy Bridge system was able to scale to more cores before the performance flat lined. The Harpertowm system did not improve when using 2 threads over 1 thread. So I started to look into why.

Running with 1 thread:
likwid-perfctr -C S0:0@S1:0 -g MEM ./example 1

+-----------------------------+-------------+---------+
|           Metric            |   core 0    | core 1  |
+-----------------------------+-------------+---------+
|     Runtime (RDTSC) [s]     |    7.795    |  7.795  |
|    Runtime unhalted [s]     | 0.00113065  | 5.25528 |
|             CPI             |   1.38358   | 2.23844 |
| Memory bandwidth [MBytes/s] |  0.0809299  | 2162.31 | <--- Look here
| Memory data volume [GBytes] | 0.000630848 | 16.8552 |
+-----------------------------+-------------+---------+

Now I'm going to tell likwid to force pthreads to use socket 0 core 0 and socket 1 core 0 to eliminate any shared in-socket resources.

Running with 2 threads:
likwid-perfctr -C S0:0@S1:0 -g MEM ./example 2

+-----------------------------+---------+---------+
|           Metric            | core 0  | core 1  |
+-----------------------------+---------+---------+
|     Runtime (RDTSC) [s]     | 12.0259 | 12.0259 |
|    Runtime unhalted [s]     | 8.20732 | 8.25249 |
|             CPI             | 3.49461 | 3.51511 |
| Memory bandwidth [MBytes/s] | 1398.74 | 1399.11 | <--- Look here, bandwidth looks shared!
| Memory data volume [GBytes] | 16.8211 | 16.8256 |
+-----------------------------+---------+---------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment