Skip to content

Instantly share code, notes, and snippets.

@tanakamura
Last active December 13, 2020 17:21
Show Gist options
  • Save tanakamura/0d930b3c0936cb00e9d4e2ba4bd48cbc to your computer and use it in GitHub Desktop.
Save tanakamura/0d930b3c0936cb00e9d4e2ba4bd48cbc to your computer and use it in GitHub Desktop.
ostimer: clock_gettime
userland_timer: rdtscp
perf_counter: yes
Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
==== membw_1t ====
| |MiB/sec
=================================
| simple-long-copy |14401.82962
---------------------------------
| libc-memcpy |18055.36681
---------------------------------
| gccvec128-copy |14367.06621
---------------------------------
| sse-stream-copy |17519.39882
---------------------------------
| avx256-copy |14490.49320
---------------------------------
| x86-rep-movs1 |14525.92013
---------------------------------
| x86-rep-movs2 |14662.59120
---------------------------------
| x86-rep-movs4 |14653.44981
---------------------------------
| simple-long-sum |16560.13569
---------------------------------
| gccvec128-load |15150.17975
---------------------------------
| avx256-load |15016.65657
---------------------------------
| x86-rep-scas1 | 1616.23361
---------------------------------
| x86-rep-scas2 | 3211.58088
---------------------------------
| x86-rep-scas4 | 6285.99709
---------------------------------
| libc-memset |27513.43399
---------------------------------
| simple-long-store |27548.59920
---------------------------------
| gccvec128-store |27546.38775
---------------------------------
| sse-stream-store |30166.14929
---------------------------------
| avx256-store |14111.73296
---------------------------------
|avx256-stream-store |30181.45710
---------------------------------
| x86-rep-stos1 |27610.85059
---------------------------------
| x86-rep-stos2 |27607.54917
---------------------------------
| x86-rep-stos4 |27601.76585
---------------------------------
v : test_name
==== membw_mt ====
| |MiB/sec
=================================
| simple-long-copy |15750.04690
---------------------------------
| libc-memcpy |23371.62053
---------------------------------
| gccvec128-copy |14871.82921
---------------------------------
| sse-stream-copy |22242.66879
---------------------------------
| avx256-copy |13115.82508
---------------------------------
| x86-rep-movs1 |19739.23530
---------------------------------
| x86-rep-movs2 |19450.23398
---------------------------------
| x86-rep-movs4 |19546.75306
---------------------------------
| simple-long-sum |20136.98552
---------------------------------
| gccvec128-load |14518.07575
---------------------------------
| avx256-load |10165.90417
---------------------------------
| x86-rep-scas1 | 6334.69327
---------------------------------
| x86-rep-scas2 |12796.49295
---------------------------------
| x86-rep-scas4 |23252.14068
---------------------------------
| libc-memset |29441.25665
---------------------------------
| simple-long-store |29414.12507
---------------------------------
| gccvec128-store |29596.90307
---------------------------------
| sse-stream-store |29712.89103
---------------------------------
| avx256-store |15340.52021
---------------------------------
|avx256-stream-store |29826.97798
---------------------------------
| x86-rep-stos1 |29497.92604
---------------------------------
| x86-rep-stos2 |29508.28452
---------------------------------
| x86-rep-stos4 |29616.49163
---------------------------------
v : test_name
==== random-access-seq ====
| |nsec/access
=======================
| 512 | 1.91621
-----------------------
| 1024 | 1.70279
-----------------------
| 2048 | 1.59140
-----------------------
| 4096 | 1.53711
-----------------------
| 8192 | 1.50926
-----------------------
| 16384 | 1.49487
-----------------------
| 32768 | 1.50249
-----------------------
| 65536 | 2.56979
-----------------------
| 131072 | 3.06180
-----------------------
| 262144 | 3.45743
-----------------------
| 524288 | 8.89210
-----------------------
| 1048576 | 11.59729
-----------------------
| 2097152 | 12.96214
-----------------------
| 4194304 | 14.57185
-----------------------
| 8388608 | 44.90422
-----------------------
| 16777216 | 88.19209
-----------------------
| 33554432 | 110.41225
-----------------------
| 67108864 | 121.63653
-----------------------
|134217728 | 132.90501
-----------------------
v : range[KiByte]
==== random-access-para ====
| |nsec/access
=======================
| 512 | 8.84251
-----------------------
| 1024 | 8.49259
-----------------------
| 2048 | 8.49045
-----------------------
| 4096 | 8.49004
-----------------------
| 8192 | 8.49011
-----------------------
| 16384 | 8.48990
-----------------------
| 32768 | 8.49113
-----------------------
| 65536 | 8.48333
-----------------------
| 131072 | 8.47339
-----------------------
| 262144 | 8.46539
-----------------------
| 524288 | 8.49370
-----------------------
| 1048576 | 8.51487
-----------------------
| 2097152 | 8.52732
-----------------------
| 4194304 | 8.55595
-----------------------
| 8388608 | 13.68239
-----------------------
| 16777216 | 16.91060
-----------------------
| 33554432 | 17.64928
-----------------------
| 67108864 | 17.81128
-----------------------
|134217728 | 18.97304
-----------------------
v : range[KiByte]
ostimer: clock_gettime
userland_timer: rdtscp
perf_counter: no
11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
==== membw_1t ====
| |MiB/sec
=================================
| simple-long-copy |22826.86634
---------------------------------
| libc-memcpy |29619.26417
---------------------------------
| gccvec128-copy |22743.62214
---------------------------------
| sse-stream-copy |27815.95567
---------------------------------
| avx256-copy |19470.66848
---------------------------------
| avx512-copy |21029.98822
---------------------------------
| x86-rep-movs1 |24525.84410
---------------------------------
| x86-rep-movs2 |24435.90124
---------------------------------
| x86-rep-movs4 |24593.24322
---------------------------------
| simple-long-sum |18061.35707
---------------------------------
| gccvec128-load |17896.65894
---------------------------------
| avx256-load |21373.63575
---------------------------------
| avx512-load |19417.70884
---------------------------------
| x86-rep-scas1 | 1992.77580
---------------------------------
| x86-rep-scas2 | 3967.41683
---------------------------------
| x86-rep-scas4 | 7591.44950
---------------------------------
| libc-memset |32223.45970
---------------------------------
| simple-long-store |32166.02909
---------------------------------
| gccvec128-store |32235.88589
---------------------------------
| sse-stream-store |37677.28627
---------------------------------
| avx256-store |16149.75701
---------------------------------
|avx256-stream-store |37860.09463
---------------------------------
| avx512-store |14739.50620
---------------------------------
|avx512-stream-store |37854.56600
---------------------------------
| x86-rep-stos1 |36872.94301
---------------------------------
| x86-rep-stos2 |36890.29596
---------------------------------
| x86-rep-stos4 |36923.74337
---------------------------------
v : test_name
==== membw_mt ====
| |MiB/sec
=================================
| simple-long-copy |37820.64220
---------------------------------
| libc-memcpy |32019.66919
---------------------------------
| gccvec128-copy |38086.76503
---------------------------------
| sse-stream-copy |32749.68232
---------------------------------
| avx256-copy |34657.25151
---------------------------------
| avx512-copy |29416.33241
---------------------------------
| x86-rep-movs1 |32077.07960
---------------------------------
| x86-rep-movs2 |32108.41587
---------------------------------
| x86-rep-movs4 |32090.76304
---------------------------------
| simple-long-sum |42047.67148
---------------------------------
| gccvec128-load |41918.90650
---------------------------------
| avx256-load |42010.77295
---------------------------------
| avx512-load |42096.92745
---------------------------------
| x86-rep-scas1 | 7203.32616
---------------------------------
| x86-rep-scas2 |14020.87465
---------------------------------
| x86-rep-scas4 |26777.81139
---------------------------------
| libc-memset |36155.27683
---------------------------------
| simple-long-store |36305.28894
---------------------------------
| gccvec128-store |36372.21055
---------------------------------
| sse-stream-store |36855.76200
---------------------------------
| avx256-store |35550.15594
---------------------------------
|avx256-stream-store |36894.57491
---------------------------------
| avx512-store |24878.41088
---------------------------------
|avx512-stream-store |37088.34406
---------------------------------
| x86-rep-stos1 |38528.83467
---------------------------------
| x86-rep-stos2 |38507.36854
---------------------------------
| x86-rep-stos4 |38478.47936
---------------------------------
v : test_name
==== random-access-seq ====
| |nsec/access
=======================
| 512 | 1.49033
-----------------------
| 1024 | 1.34486
-----------------------
| 2048 | 1.27037
-----------------------
| 4096 | 1.23455
-----------------------
| 8192 | 1.21670
-----------------------
| 16384 | 1.20866
-----------------------
| 32768 | 1.20283
-----------------------
| 65536 | 1.77204
-----------------------
| 131072 | 2.57737
-----------------------
| 262144 | 2.97563
-----------------------
| 524288 | 3.99668
-----------------------
| 1048576 | 4.51202
-----------------------
| 2097152 | 7.96752
-----------------------
| 4194304 | 10.96388
-----------------------
| 8388608 | 22.61538
-----------------------
| 16777216 | 61.32126
-----------------------
| 33554432 | 87.66226
-----------------------
| 67108864 | 100.58076
-----------------------
|134217728 | 109.50422
-----------------------
v : range[KiByte]
==== random-access-para ====
| |nsec/access
=======================
| 512 | 4.37776
-----------------------
| 1024 | 4.37771
-----------------------
| 2048 | 4.37739
-----------------------
| 4096 | 4.37596
-----------------------
| 8192 | 4.37777
-----------------------
| 16384 | 4.37372
-----------------------
| 32768 | 4.37429
-----------------------
| 65536 | 4.40113
-----------------------
| 131072 | 4.42841
-----------------------
| 262144 | 4.44304
-----------------------
| 524288 | 4.45105
-----------------------
| 1048576 | 4.45728
-----------------------
| 2097152 | 4.45256
-----------------------
| 4194304 | 4.55780
-----------------------
| 8388608 | 6.03434
-----------------------
| 16777216 | 7.42892
-----------------------
| 33554432 | 8.22690
-----------------------
| 67108864 | 8.88918
-----------------------
|134217728 | 9.61945
-----------------------
v : range[KiByte]
ostimer: clock_gettime
userland_timer: rdtscp
perf_counter: yes
AMD Ryzen 7 3700X 8-Core Processor
==== membw_1t ====
| |MiB/sec
=================================
| simple-long-copy |16780.75016
---------------------------------
| libc-memcpy |33097.22930
---------------------------------
| gccvec128-copy |17012.89380
---------------------------------
| sse-stream-copy |32068.56851
---------------------------------
| avx256-copy |16766.83822
---------------------------------
| x86-rep-movs1 |15107.02432
---------------------------------
| x86-rep-movs2 |15125.04623
---------------------------------
| x86-rep-movs4 |14937.36617
---------------------------------
| simple-long-sum |20304.78795
---------------------------------
| gccvec128-load |23562.33709
---------------------------------
| avx256-load |23401.18705
---------------------------------
| x86-rep-scas1 | 1993.99048
---------------------------------
| x86-rep-scas2 | 3794.26677
---------------------------------
| x86-rep-scas4 | 6762.53800
---------------------------------
| libc-memset |11080.92973
---------------------------------
| simple-long-store |11603.65496
---------------------------------
| gccvec128-store |12183.82629
---------------------------------
| sse-stream-store |24106.08732
---------------------------------
| avx256-store |12390.28532
---------------------------------
|avx256-stream-store |24138.08751
---------------------------------
| x86-rep-stos1 |11944.17782
---------------------------------
| x86-rep-stos2 |11842.65443
---------------------------------
| x86-rep-stos4 |11850.58597
---------------------------------
v : test_name
==== membw_mt ====
| |MiB/sec
=================================
| simple-long-copy |17053.27794
---------------------------------
| libc-memcpy |16744.91778
---------------------------------
| gccvec128-copy |16454.10284
---------------------------------
| sse-stream-copy |33480.83913
---------------------------------
| avx256-copy |17036.10464
---------------------------------
| x86-rep-movs1 |18570.79790
---------------------------------
| x86-rep-movs2 |17893.12833
---------------------------------
| x86-rep-movs4 |18043.44282
---------------------------------
| simple-long-sum |36958.56599
---------------------------------
| gccvec128-load |36194.05506
---------------------------------
| avx256-load |35754.97141
---------------------------------
| x86-rep-scas1 |14994.77277
---------------------------------
| x86-rep-scas2 |27659.46344
---------------------------------
| x86-rep-scas4 |37120.44066
---------------------------------
| libc-memset |12440.60649
---------------------------------
| simple-long-store |12600.08198
---------------------------------
| gccvec128-store |12247.10738
---------------------------------
| sse-stream-store |24031.31687
---------------------------------
| avx256-store |12621.19628
---------------------------------
|avx256-stream-store |24009.40828
---------------------------------
| x86-rep-stos1 |13018.15839
---------------------------------
| x86-rep-stos2 |14270.02853
---------------------------------
| x86-rep-stos4 |14265.82982
---------------------------------
v : test_name
==== random-access-seq ====
| |nsec/access
=======================
| 512 | 1.70442
-----------------------
| 1024 | 1.55866
-----------------------
| 2048 | 1.47893
-----------------------
| 4096 | 1.43987
-----------------------
| 8192 | 1.42611
-----------------------
| 16384 | 1.40998
-----------------------
| 32768 | 1.41987
-----------------------
| 65536 | 2.42428
-----------------------
| 131072 | 2.89050
-----------------------
| 262144 | 3.16151
-----------------------
| 524288 | 4.87482
-----------------------
| 1048576 | 7.57721
-----------------------
| 2097152 | 9.27238
-----------------------
| 4194304 | 10.08325
-----------------------
| 8388608 | 10.57104
-----------------------
| 16777216 | 21.57613
-----------------------
| 33554432 | 56.19952
-----------------------
| 67108864 | 74.00839
-----------------------
|134217728 | 86.93786
-----------------------
v : range[KiByte]
==== random-access-para ====
| |nsec/access
=======================
| 512 | 3.23662
-----------------------
| 1024 | 3.22158
-----------------------
| 2048 | 3.22392
-----------------------
| 4096 | 3.22410
-----------------------
| 8192 | 3.23093
-----------------------
| 16384 | 3.22511
-----------------------
| 32768 | 3.23194
-----------------------
| 65536 | 3.23488
-----------------------
| 131072 | 3.27599
-----------------------
| 262144 | 3.26046
-----------------------
| 524288 | 3.27706
-----------------------
| 1048576 | 3.27472
-----------------------
| 2097152 | 3.27815
-----------------------
| 4194304 | 3.31390
-----------------------
| 8388608 | 3.61629
-----------------------
| 16777216 | 6.22318
-----------------------
| 33554432 | 7.89540
-----------------------
| 67108864 | 9.22039
-----------------------
|134217728 | 8.92101
-----------------------
v : range[KiByte]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment