Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save csullivan/cb4766d7f09441f7375f8bce0bb5fc92 to your computer and use it in GitHub Desktop.
Save csullivan/cb4766d7f09441f7375f8bce0bb5fc92 to your computer and use it in GitHub Desktop.
Performance comparison: 5% gain using wgmma with LHS in registers vs shared.
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- -------- -------- -------- -------- ----------- ----------------------------------------------------------------------------------------------------
---- 2495089 101 24703.9 24736.0 24544 27520 302.9 wgmma_f16_m64n256k16_kernel_shared_layout(__half *, __half *, __half *)
---- 2361204 101 23378.3 23423.0 23231 25600 245.6 wgmma_f16_m64n256k16_register_layout_kernel(__half *, __half *, __half *)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment