Matrix multiplication project is available with the VTune Profiler application.
-
Open and create project
-
Initial measurements - Performance Snapshot
-
Hotspots, inspect bottom-up, source and flamegraph
-
Memory Access issues
-
Update the algorithm -
multiplication2
, check optimization levels (-O1
) andmake
-
Check Performance Snapshot
-
Increase optimization level to
-O3
- auto vectorization forg++
andmake
-
Check Performance Snapshot and HPC Characterization
-
Enable AVX512 vectorization - in my case for Skylake architecture add CXXFLAG
-march=skylake-avx512
(choose appropriate for your CPU) andmake
-
Check Performance Snapshot and Micro Architecture Analysis
-
Compare results