- Matmul benchmark of Group-ordering vs Row-major ordering on A100 => No significant improvment over row-major ordering
matmul-performance:
M group_ordering row_major_ordering
0 256.0 3.640889 3.640889
1 384.0 11.059200 12.288000
2 512.0 23.831273 23.831273
3 640.0 39.384616 39.384616
4 768.0 58.982401 58.982401