Skip to content

Instantly share code, notes, and snippets.

@TomasDrozdik
Last active March 18, 2022 10:22
Show Gist options
  • Save TomasDrozdik/d02fb16572303a5ac9e70f88f6c5bfdc to your computer and use it in GitHub Desktop.
Save TomasDrozdik/d02fb16572303a5ac9e70f88f6c5bfdc to your computer and use it in GitHub Desktop.

Intel VTune Profiler Demo for NSWI126

Installation

Matrix multiplication analysis tutorial

Matrix multiplication project is available with the VTune Profiler application.

Agenda

  1. Open and create project

  2. Initial measurements - Performance Snapshot

  3. Hotspots, inspect bottom-up, source and flamegraph

  4. Memory Access issues

  5. Update the algorithm - multiplication2, check optimization levels (-O1) and make

  6. Check Performance Snapshot

  7. Increase optimization level to -O3 - auto vectorization for g++ and make

  8. Check Performance Snapshot and HPC Characterization

  9. Enable AVX512 vectorization - in my case for Skylake architecture add CXXFLAG -march=skylake-avx512 (choose appropriate for your CPU) and make

  10. Check Performance Snapshot and Micro Architecture Analysis

  11. Compare results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment