Skip to content

Instantly share code, notes, and snippets.

@simcop2387
Last active December 11, 2016 03:57
Show Gist options
  • Save simcop2387/32c8fb863403ee714227afe0ee3dd637 to your computer and use it in GitHub Desktop.
Save simcop2387/32c8fb863403ee714227afe0ee3dd637 to your computer and use it in GitHub Desktop.
running some fft code, will edit with updates since running takes a bit
cpu info: (HT enabled)
model name : Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz
-> % ./compile.sh
gcc -O1 -o fft-test-fft-portable.c-O1 fft-test.c fft-portable.c -lm -march=native
gcc -O1 -o fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O1 fft-test.c fft-x8664-avx-aux.c fft-x8664-avx.s -lm -march=native
gcc -O1 -o fft-test-fft-fast-reverse.c-O1 fft-test.c fft-fast-reverse.c -lm -march=native
gcc -O2 -o fft-test-fft-portable.c-O2 fft-test.c fft-portable.c -lm -march=native
gcc -O2 -o fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O2 fft-test.c fft-x8664-avx-aux.c fft-x8664-avx.s -lm -march=native
gcc -O2 -o fft-test-fft-fast-reverse.c-O2 fft-test.c fft-fast-reverse.c -lm -march=native
gcc -O3 -o fft-test-fft-portable.c-O3 fft-test.c fft-portable.c -lm -march=native
gcc -O3 -o fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O3 fft-test.c fft-x8664-avx-aux.c fft-x8664-avx.s -lm -march=native
gcc -O3 -o fft-test-fft-fast-reverse.c-O3 fft-test.c fft-fast-reverse.c -lm -march=native
gcc -O1 -o fft-test-fft-portable.c-O1 fft-test.c fft-portable.c -lm -ffast-math -march=native
gcc -O1 -o fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O1 fft-test.c fft-x8664-avx-aux.c fft-x8664-avx.s -lm -ffast-math -march=native
gcc -O1 -o fft-test-fft-fast-reverse.c-O1 fft-test.c fft-fast-reverse.c -lm -ffast-math -march=native
gcc -O2 -o fft-test-fft-portable.c-O2 fft-test.c fft-portable.c -lm -ffast-math -march=native
gcc -O2 -o fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O2 fft-test.c fft-x8664-avx-aux.c fft-x8664-avx.s -lm -ffast-math -march=native
gcc -O2 -o fft-test-fft-fast-reverse.c-O2 fft-test.c fft-fast-reverse.c -lm -ffast-math -march=native
gcc -O3 -o fft-test-fft-portable.c-O3 fft-test.c fft-portable.c -lm -ffast-math -march=native
gcc -O3 -o fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O3 fft-test.c fft-x8664-avx-aux.c fft-x8664-avx.s -lm -ffast-math -march=native
gcc -O3 -o fft-test-fft-fast-reverse.c-O3 fft-test.c fft-fast-reverse.c -lm -ffast-math -march=native
ryan@pegasus [05:25:39] [~/tmp/ffttest] [master *] ∞
-> % ls
compile.sh fft-test-fft-fast-reverse.c-O1 fft-test-fft-portable.c-O1 fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O1 fft-x8664-avx-aux.c
fft-fast-reverse.c fft-test-fft-fast-reverse.c-O1-ffast-math fft-test-fft-portable.c-O1-ffast-math fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O1-ffast-math fft-x8664-avx.s
fft.h fft-test-fft-fast-reverse.c-O2 fft-test-fft-portable.c-O2 fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O2
fft-model-of-x8664-avx.c fft-test-fft-fast-reverse.c-O2-ffast-math fft-test-fft-portable.c-O2-ffast-math fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O2-ffast-math
fft-portable.c fft-test-fft-fast-reverse.c-O3 fft-test-fft-portable.c-O3 fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O3
fft-test.c fft-test-fft-fast-reverse.c-O3-ffast-math fft-test-fft-portable.c-O3-ffast-math fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O3-ffast-math
ryan@pegasus [05:25:41] [~/tmp/ffttest] [master *] ∞
-> % for i in fft-test-*; do echo $i; ./"$i"; done
fft-test-fft-fast-reverse.c-O1
Self-test passed
Size Time per FFT (ns)
4 min=33 mean=33 sd=0.51%
8 min=63 mean=64 sd=0.40%
16 min=124 mean=127 sd=4.07%
32 min=255 mean=260 sd=1.55%
64 min=547 mean=548 sd=0.40%
128 min=1175 mean=1184 sd=0.40%
256 min=2688 mean=2760 sd=4.41%
512 min=6439 mean=6560 sd=1.23%
1024 min=14559 mean=14824 sd=1.52%
2048 min=33516 mean=33967 sd=0.89%
4096 min=84175 mean=85336 sd=1.45%
8192 min=203415 mean=204443 sd=0.45%
16384 min=450769 mean=452903 sd=0.30%
32768 min=1085327 mean=1090649 sd=0.37%
65536 min=2521460 mean=2544502 sd=1.42%
131072 min=5449538 mean=5481806 sd=0.31%
262144 min=12632371 mean=12734313 sd=0.60%
524288 min=48876464 mean=49182393 sd=0.45%
1048576 min=137156168 mean=143718204 sd=4.84%
2097152 min=329195512 mean=341430551 sd=3.09%
4194304 min=770286820 mean=779868243 sd=1.20%
8388608 min=1714753804 mean=1744676667 sd=1.19%
16777216 min=3698690960 mean=3726221257 sd=0.46%
33554432 min=8686518102 mean=8948786357 sd=2.78%
67108864 min=19508434801 mean=20031424657 sd=1.41%
fft-test-fft-fast-reverse.c-O1-ffast-math
Self-test passed
Size Time per FFT (ns)
4 min=32 mean=32 sd=3.10%
8 min=63 mean=64 sd=0.96%
16 min=125 mean=126 sd=0.83%
32 min=255 mean=257 sd=0.55%
64 min=549 mean=558 sd=0.73%
128 min=1185 mean=1237 sd=4.49%
256 min=2659 mean=2746 sd=2.06%
512 min=6536 mean=6755 sd=3.10%
1024 min=14685 mean=14820 sd=0.68%
2048 min=33361 mean=33579 sd=0.36%
4096 min=83192 mean=84005 sd=0.52%
8192 min=203026 mean=205558 sd=0.87%
16384 min=461842 mean=464257 sd=0.51%
32768 min=1061349 mean=1067589 sd=0.37%
65536 min=2527941 mean=2539377 sd=0.44%
131072 min=5486153 mean=5579665 sd=1.97%
262144 min=12520282 mean=12721374 sd=1.08%
524288 min=47926482 mean=48992224 sd=1.34%
1048576 min=125446996 mean=130424607 sd=1.87%
2097152 min=326062006 mean=331144270 sd=0.94%
4194304 min=770928941 mean=783470377 sd=1.20%
8388608 min=1720961150 mean=1738438403 sd=0.81%
16777216 min=3619071155 mean=3716338192 sd=1.37%
33554432 min=9351507217 mean=9739494844 sd=1.63%
67108864 min=20027338992 mean=20999033842 sd=2.71%
fft-test-fft-fast-reverse.c-O2
Self-test passed
Size Time per FFT (ns)
4 min=33 mean=33 sd=0.78%
8 min=64 mean=65 sd=1.04%
16 min=127 mean=131 sd=3.45%
32 min=263 mean=265 sd=0.73%
64 min=554 mean=559 sd=0.63%
128 min=1205 mean=1215 sd=0.76%
256 min=2743 mean=2776 sd=1.24%
512 min=6355 mean=6984 sd=17.21%
1024 min=14149 mean=14343 sd=1.60%
2048 min=32398 mean=32643 sd=0.37%
4096 min=80891 mean=83002 sd=1.38%
8192 min=200328 mean=204820 sd=0.91%
16384 min=471322 mean=478540 sd=1.61%
32768 min=1076839 mean=1102180 sd=1.35%
65536 min=2495814 mean=2554673 sd=1.83%
131072 min=5375042 mean=5433872 sd=0.71%
262144 min=11979172 mean=12162848 sd=1.06%
524288 min=46814770 mean=47623088 sd=1.71%
1048576 min=127624421 mean=131521130 sd=1.73%
2097152 min=330593540 mean=332444874 sd=0.39%
4194304 min=779137558 mean=783693574 sd=0.63%
8388608 min=1672096117 mean=1689973187 sd=1.53%
16777216 min=3783191272 mean=3817596426 sd=0.84%
33554432 min=9002983278 mean=9038537507 sd=0.63%
67108864 min=19766120945 mean=20011697860 sd=1.41%
fft-test-fft-fast-reverse.c-O2-ffast-math
Self-test passed
Size Time per FFT (ns)
4 min=32 mean=33 sd=0.33%
8 min=64 mean=65 sd=1.11%
16 min=126 mean=128 sd=3.28%
32 min=263 mean=264 sd=0.47%
64 min=555 mean=560 sd=1.29%
128 min=1199 mean=1202 sd=0.24%
256 min=2703 mean=2715 sd=0.29%
512 min=6284 mean=6377 sd=2.09%
1024 min=13935 mean=14033 sd=0.41%
2048 min=32048 mean=32221 sd=0.32%
4096 min=80721 mean=81661 sd=1.45%
8192 min=196526 mean=197526 sd=0.48%
16384 min=443216 mean=448448 sd=0.56%
32768 min=1063794 mean=1071064 sd=0.31%
65536 min=2456222 mean=2477332 sd=0.50%
131072 min=5275760 mean=5315918 sd=0.65%
262144 min=12107042 mean=12237415 sd=0.65%
524288 min=47373218 mean=48458316 sd=2.29%
1048576 min=135656856 mean=138771699 sd=1.85%
2097152 min=316536908 mean=324781964 sd=1.61%
4194304 min=783505615 mean=790194482 sd=0.61%
8388608 min=1709911655 mean=1734177075 sd=1.43%
16777216 min=3849116216 mean=3903447563 sd=0.88%
33554432 min=9551807506 mean=9688232278 sd=0.89%
67108864 min=21115700012 mean=21447073175 sd=0.94%
fft-test-fft-fast-reverse.c-O3
Self-test passed
Size Time per FFT (ns)
4 min=32 mean=33 sd=1.61%
8 min=64 mean=67 sd=4.38%
16 min=128 mean=129 sd=0.43%
32 min=261 mean=264 sd=0.80%
64 min=553 mean=560 sd=0.62%
128 min=1209 mean=1219 sd=0.77%
256 min=2723 mean=2758 sd=1.26%
512 min=6301 mean=6914 sd=10.62%
1024 min=14147 mean=14273 sd=0.69%
2048 min=31950 mean=32078 sd=0.34%
4096 min=80297 mean=80966 sd=0.36%
8192 min=192731 mean=194548 sd=0.77%
16384 min=435562 mean=438140 sd=0.23%
32768 min=1054986 mean=1058726 sd=0.29%
65536 min=2465264 mean=2493980 sd=0.42%
131072 min=5349606 mean=5419895 sd=2.72%
262144 min=11888259 mean=12181527 sd=1.88%
524288 min=46664307 mean=47205191 sd=0.54%
1048576 min=129513085 mean=131481428 sd=1.11%
2097152 min=330716476 mean=333448843 sd=0.87%
4194304 min=776329846 mean=793749046 sd=2.03%
8388608 min=1788287558 mean=1801021542 sd=0.65%
16777216 min=3712038163 mean=3773374890 sd=1.94%
33554432 min=9499434283 mean=9855332309 sd=2.04%
67108864 min=20449026789 mean=21139111412 sd=2.16%
fft-test-fft-fast-reverse.c-O3-ffast-math
Self-test passed
Size Time per FFT (ns)
4 min=34 mean=35 sd=5.33%
8 min=64 mean=64 sd=0.31%
16 min=125 mean=126 sd=0.33%
32 min=262 mean=263 sd=0.20%
64 min=551 mean=554 sd=0.27%
128 min=1200 mean=1254 sd=7.89%
256 min=2728 mean=2755 sd=0.66%
512 min=6314 mean=6361 sd=0.51%
1024 min=14069 mean=14194 sd=0.75%
2048 min=31992 mean=32107 sd=0.21%
4096 min=79786 mean=80281 sd=0.29%
8192 min=193619 mean=197042 sd=0.92%
16384 min=445652 mean=449063 sd=0.48%
32768 min=1042601 mean=1095536 sd=5.55%
65536 min=2456270 mean=2497578 sd=1.04%
131072 min=5377907 mean=5455902 sd=1.34%
262144 min=11911059 mean=12210526 sd=2.30%
524288 min=47385903 mean=48479036 sd=1.81%
1048576 min=130848773 mean=132817797 sd=1.17%
2097152 min=320440721 mean=322104962 sd=0.68%
4194304 min=767777558 mean=771998215 sd=0.69%
8388608 min=1703993385 mean=1724462865 sd=1.63%
16777216 min=3575134162 mean=3612203704 sd=1.04%
33554432 min=9293039293 mean=9458641813 sd=1.63%
67108864 min=20093664062 mean=20779339844 sd=2.24%
fft-test-fft-portable.c-O1
Self-test passed
Size Time per FFT (ns)
4 min=33 mean=33 sd=0.88%
8 min=63 mean=64 sd=1.46%
16 min=126 mean=134 sd=7.85%
32 min=259 mean=260 sd=0.32%
64 min=555 mean=560 sd=0.37%
128 min=1203 mean=1218 sd=2.28%
256 min=2721 mean=2728 sd=0.15%
512 min=6459 mean=6879 sd=7.33%
1024 min=15223 mean=18242 sd=8.82%
2048 min=33390 mean=35529 sd=5.77%
4096 min=85012 mean=87692 sd=2.60%
8192 min=208326 mean=213258 sd=2.11%
16384 min=456313 mean=462664 sd=1.82%
32768 min=1106090 mean=1110366 sd=0.32%
65536 min=2537227 mean=2556189 sd=0.28%
131072 min=5531075 mean=5598869 sd=0.52%
262144 min=12739647 mean=12954491 sd=0.79%
524288 min=49626074 mean=49771418 sd=0.21%
1048576 min=126092066 mean=129133033 sd=0.98%
2097152 min=329515588 mean=332574020 sd=0.61%
4194304 min=803590319 mean=809120005 sd=0.61%
8388608 min=1732371490 mean=1745769877 sd=0.35%
16777216 min=3706188958 mean=3754443075 sd=0.98%
33554432 min=9226044323 mean=9606186306 sd=2.93%
67108864 min=19981840657 mean=20894473367 sd=2.76%
fft-test-fft-portable.c-O1-ffast-math
Self-test passed
Size Time per FFT (ns)
4 min=33 mean=33 sd=0.61%
8 min=64 mean=67 sd=6.93%
16 min=126 mean=127 sd=0.30%
32 min=260 mean=261 sd=0.27%
64 min=562 mean=564 sd=0.23%
128 min=1226 mean=1234 sd=0.61%
256 min=2724 mean=2745 sd=0.65%
512 min=6483 mean=6519 sd=0.39%
1024 min=14485 mean=14626 sd=0.78%
2048 min=33637 mean=34157 sd=2.20%
4096 min=86887 mean=87922 sd=0.97%
8192 min=205585 mean=207690 sd=1.03%
16384 min=450643 mean=453683 sd=0.51%
32768 min=1087247 mean=1095044 sd=0.73%
65536 min=2528114 mean=2562585 sd=0.91%
131072 min=5607945 mean=5694001 sd=1.29%
262144 min=12733404 mean=13558646 sd=6.44%
524288 min=49519156 mean=50270619 sd=2.05%
1048576 min=140985020 mean=143076889 sd=0.65%
2097152 min=329845496 mean=332574982 sd=0.56%
4194304 min=786666091 mean=793451604 sd=1.11%
8388608 min=1704163841 mean=1751911575 sd=1.83%
16777216 min=3730602579 mean=3756672846 sd=0.43%
33554432 min=9812799097 mean=9983337800 sd=0.77%
67108864 min=21480264180 mean=21642786494 sd=0.54%
fft-test-fft-portable.c-O2
Self-test passed
Size Time per FFT (ns)
4 min=32 mean=32 sd=0.68%
8 min=63 mean=64 sd=0.35%
16 min=125 mean=126 sd=0.56%
32 min=260 mean=263 sd=1.73%
64 min=546 mean=550 sd=0.48%
128 min=1183 mean=1189 sd=0.49%
256 min=2685 mean=2703 sd=0.51%
512 min=6211 mean=6235 sd=0.31%
1024 min=13796 mean=13886 sd=0.69%
2048 min=38069 mean=38280 sd=0.47%
4096 min=79696 mean=80204 sd=0.55%
8192 min=193119 mean=195284 sd=1.99%
16384 min=440551 mean=455068 sd=6.24%
32768 min=1064527 mean=1070281 sd=0.41%
65536 min=2470544 mean=2483433 sd=0.38%
131072 min=5316888 mean=5344487 sd=0.35%
262144 min=11925276 mean=12060757 sd=0.59%
524288 min=46074476 mean=46396722 sd=0.63%
1048576 min=124396636 mean=128481673 sd=2.04%
2097152 min=312700279 mean=324409812 sd=1.94%
4194304 min=764645453 mean=784309770 sd=1.44%
8388608 min=1648134838 mean=1655667357 sd=0.45%
16777216 min=3627722457 mean=3683716120 sd=1.14%
33554432 min=9309375153 mean=9574886508 sd=3.04%
67108864 min=19781142428 mean=19991910873 sd=0.63%
fft-test-fft-portable.c-O2-ffast-math
Self-test passed
Size Time per FFT (ns)
4 min=33 mean=34 sd=2.35%
8 min=65 mean=65 sd=2.34%
16 min=128 mean=128 sd=0.15%
32 min=265 mean=266 sd=0.12%
64 min=558 mean=560 sd=0.28%
128 min=1207 mean=1220 sd=0.82%
256 min=2719 mean=2744 sd=0.47%
512 min=6363 mean=6408 sd=0.97%
1024 min=14204 mean=14363 sd=1.60%
2048 min=32382 mean=32479 sd=0.21%
4096 min=81050 mean=81287 sd=0.18%
8192 min=196286 mean=196660 sd=0.11%
16384 min=447396 mean=447822 sd=0.10%
32768 min=1070570 mean=1077543 sd=0.52%
65536 min=2482243 mean=2487836 sd=0.17%
131072 min=5404039 mean=5422047 sd=0.18%
262144 min=12962883 mean=13193971 sd=1.31%
524288 min=47942510 mean=48236116 sd=0.41%
1048576 min=122287532 mean=122990488 sd=0.48%
2097152 min=319009829 mean=320311138 sd=0.26%
4194304 min=774144789 mean=787387485 sd=1.96%
8388608 min=1619068874 mean=1626296720 sd=0.27%
16777216 min=3654319919 mean=3669755180 sd=0.43%
33554432 min=9274670681 mean=9396478072 sd=1.24%
67108864 min=19826230212 mean=20560723772 sd=2.47%
fft-test-fft-portable.c-O3
Self-test passed
Size Time per FFT (ns)
4 min=33 mean=34 sd=1.52%
8 min=65 mean=65 sd=0.51%
16 min=128 mean=131 sd=3.33%
32 min=265 mean=266 sd=0.24%
64 min=559 mean=567 sd=2.58%
128 min=1218 mean=1229 sd=1.47%
256 min=2753 mean=2761 sd=0.21%
512 min=6375 mean=6387 sd=0.16%
1024 min=14181 mean=14245 sd=0.26%
2048 min=32440 mean=32513 sd=0.15%
4096 min=80605 mean=81881 sd=1.20%
8192 min=195834 mean=199943 sd=4.14%
16384 min=443070 mean=447856 sd=1.38%
32768 min=1081364 mean=1093407 sd=0.67%
65536 min=2441118 mean=2468687 sd=0.87%
131072 min=5309112 mean=5335444 sd=0.35%
262144 min=12069155 mean=12563809 sd=2.25%
524288 min=46432918 mean=47466513 sd=1.19%
1048576 min=126822411 mean=129339033 sd=1.66%
2097152 min=319883807 mean=325519758 sd=1.77%
4194304 min=773366467 mean=787231627 sd=1.70%
8388608 min=1710091508 mean=1752380782 sd=1.75%
16777216 min=3723426064 mean=3770146226 sd=1.56%
33554432 min=8740544068 mean=8831110774 sd=1.02%
67108864 min=19886327183 mean=20314730926 sd=1.50%
fft-test-fft-portable.c-O3-ffast-math
Self-test passed
Size Time per FFT (ns)
4 min=33 mean=33 sd=1.13%
8 min=65 mean=65 sd=0.08%
16 min=128 mean=129 sd=0.36%
32 min=267 mean=267 sd=0.22%
64 min=559 mean=568 sd=2.86%
128 min=1216 mean=1220 sd=0.19%
256 min=2743 mean=2779 sd=2.28%
512 min=6323 mean=6366 sd=0.40%
1024 min=14153 mean=14282 sd=1.01%
2048 min=32189 mean=32315 sd=0.24%
4096 min=81896 mean=82190 sd=0.47%
8192 min=196323 mean=197921 sd=0.73%
16384 min=451136 mean=461474 sd=4.23%
32768 min=1063786 mean=1071010 sd=0.68%
65536 min=2485735 mean=2504352 sd=0.42%
131072 min=5430607 mean=5478230 sd=0.82%
262144 min=12213142 mean=12492096 sd=1.51%
524288 min=48459677 mean=49558897 sd=1.85%
1048576 min=125758242 mean=129102183 sd=2.69%
2097152 min=327173785 mean=340811631 sd=2.73%
4194304 min=781606514 mean=795803339 sd=2.35%
8388608 min=1697543193 mean=1723655378 sd=1.10%
16777216 min=3613735354 mean=3707026661 sd=2.12%
33554432 min=9487643321 mean=9755146803 sd=1.90%
67108864 min=20500538843 mean=21433058687 sd=1.55%
[ED: These all segfaulted, not sure why. I'll investigate later but i suspect there's something weird with the asm and GCC-6]
fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O1
fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O1-ffast-math
fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O2
fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O2-ffast-math
fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O3
fft-test-fft-x8664-avx-aux.c fft-x8664-avx.s-O3-ffast-math
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment