Created
September 5, 2025 15:41
-
-
Save maaquib/4e762336b732fd32cb76740c28689bef to your computer and use it in GitHub Desktop.
PrefilPerfComparision
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Benchmark Configuration: | |
| Batch sizes: [1] | |
| Sequence lengths: [32, 64, 128, 256, 512, 1024, 1536, 2048, 4096, 8192, 16384] | |
| Number of heads: [16, 32, 64, 128] | |
| Head dimensions: [64, 128] | |
| Causal: True | |
| Data type: torch.bfloat16 | |
| Warmup iterations: 10 | |
| Benchmark iterations: 100 | |
| ================================================================================ | |
| Running benchmarks on NVIDIA B200 | |
| ================================================================================ | |
| [1/88] Testing: seq_len=32, num_heads=16, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.014 ± 0.001 ms | |
| flashinfer : 0.013 ± 0.002 ms | |
| flash_attn : 0.025 ± 0.003 ms | |
| max : 0.032 ± 0.002 ms | |
| [2/88] Testing: seq_len=32, num_heads=16, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.014 ± 0.002 ms | |
| flashinfer : 0.015 ± 0.001 ms | |
| flash_attn : 0.024 ± 0.002 ms | |
| max : 0.051 ± 0.001 ms | |
| [3/88] Testing: seq_len=32, num_heads=32, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| All outputs match within tolerance | |
| Results: | |
| pytorch_sdpa : 0.014 ± 0.001 ms | |
| flashinfer : 0.013 ± 0.001 ms | |
| flash_attn : 0.024 ± 0.002 ms | |
| max : 0.052 ± 0.002 ms | |
| [4/88] Testing: seq_len=32, num_heads=32, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.014 ± 0.001 ms | |
| flashinfer : 0.014 ± 0.000 ms | |
| flash_attn : 0.024 ± 0.001 ms | |
| max : 0.092 ± 0.001 ms | |
| [5/88] Testing: seq_len=32, num_heads=64, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.014 ± 0.001 ms | |
| flashinfer : 0.013 ± 0.001 ms | |
| flash_attn : 0.024 ± 0.002 ms | |
| max : 0.092 ± 0.001 ms | |
| [6/88] Testing: seq_len=32, num_heads=64, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.014 ± 0.001 ms | |
| flashinfer : 0.017 ± 0.026 ms | |
| flash_attn : 0.024 ± 0.001 ms | |
| max : 0.174 ± 0.001 ms | |
| [7/88] Testing: seq_len=32, num_heads=128, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.019 ± 0.002 ms | |
| flashinfer : 0.018 ± 0.002 ms | |
| flash_attn : 0.035 ± 0.012 ms | |
| max : 0.177 ± 0.002 ms | |
| [8/88] Testing: seq_len=32, num_heads=128, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.016 ± 0.027 ms | |
| flashinfer : 0.014 ± 0.002 ms | |
| flash_attn : 0.027 ± 0.008 ms | |
| max : 0.340 ± 0.001 ms | |
| [9/88] Testing: seq_len=64, num_heads=16, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| All outputs match within tolerance | |
| Results: | |
| pytorch_sdpa : 0.013 ± 0.001 ms | |
| flashinfer : 0.013 ± 0.001 ms | |
| flash_attn : 0.024 ± 0.002 ms | |
| max : 0.050 ± 0.001 ms | |
| [10/88] Testing: seq_len=64, num_heads=16, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.014 ± 0.001 ms | |
| flashinfer : 0.014 ± 0.001 ms | |
| flash_attn : 0.024 ± 0.002 ms | |
| max : 0.088 ± 0.001 ms | |
| [11/88] Testing: seq_len=64, num_heads=32, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.014 ± 0.001 ms | |
| flashinfer : 0.013 ± 0.001 ms | |
| flash_attn : 0.024 ± 0.002 ms | |
| max : 0.090 ± 0.001 ms | |
| [12/88] Testing: seq_len=64, num_heads=32, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.014 ± 0.001 ms | |
| flashinfer : 0.014 ± 0.001 ms | |
| flash_attn : 0.023 ± 0.001 ms | |
| max : 0.169 ± 0.001 ms | |
| [13/88] Testing: seq_len=64, num_heads=64, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.014 ± 0.001 ms | |
| flashinfer : 0.013 ± 0.002 ms | |
| flash_attn : 0.027 ± 0.005 ms | |
| max : 0.172 ± 0.002 ms | |
| [14/88] Testing: seq_len=64, num_heads=64, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.015 ± 0.002 ms | |
| flashinfer : 0.016 ± 0.002 ms | |
| flash_attn : 0.029 ± 0.005 ms | |
| max : 0.332 ± 0.001 ms | |
| [15/88] Testing: seq_len=64, num_heads=128, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.018 ± 0.003 ms | |
| flashinfer : 0.017 ± 0.003 ms | |
| flash_attn : 0.033 ± 0.003 ms | |
| max : 0.340 ± 0.001 ms | |
| [16/88] Testing: seq_len=64, num_heads=128, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.014 ± 0.001 ms | |
| flashinfer : 0.014 ± 0.001 ms | |
| flash_attn : 0.024 ± 0.001 ms | |
| max : 0.665 ± 0.002 ms | |
| [17/88] Testing: seq_len=128, num_heads=16, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| All outputs match within tolerance | |
| Results: | |
| pytorch_sdpa : 0.018 ± 0.001 ms | |
| flashinfer : 0.017 ± 0.001 ms | |
| flash_attn : 0.032 ± 0.002 ms | |
| max : 0.088 ± 0.001 ms | |
| [18/88] Testing: seq_len=128, num_heads=16, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.015 ± 0.002 ms | |
| flashinfer : 0.018 ± 0.003 ms | |
| flash_attn : 0.032 ± 0.002 ms | |
| max : 0.160 ± 0.001 ms | |
| [19/88] Testing: seq_len=128, num_heads=32, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.017 ± 0.001 ms | |
| flashinfer : 0.017 ± 0.001 ms | |
| flash_attn : 0.032 ± 0.002 ms | |
| max : 0.170 ± 0.001 ms | |
| [20/88] Testing: seq_len=128, num_heads=32, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.014 ± 0.001 ms | |
| flashinfer : 0.015 ± 0.001 ms | |
| flash_attn : 0.024 ± 0.002 ms | |
| max : 0.323 ± 0.002 ms | |
| [22/88] Testing: seq_len=128, num_heads=64, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.014 ± 0.001 ms | |
| flashinfer : 0.015 ± 0.001 ms | |
| flash_attn : 0.024 ± 0.003 ms | |
| max : 0.647 ± 0.001 ms | |
| [23/88] Testing: seq_len=128, num_heads=128, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.015 ± 0.001 ms | |
| flashinfer : 0.017 ± 0.029 ms | |
| flash_attn : 0.024 ± 0.002 ms | |
| max : 0.669 ± 0.001 ms | |
| [24/88] Testing: seq_len=128, num_heads=128, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.016 ± 0.001 ms | |
| flashinfer : 0.017 ± 0.028 ms | |
| flash_attn : 0.024 ± 0.001 ms | |
| max : 1.314 ± 0.001 ms | |
| [25/88] Testing: seq_len=256, num_heads=16, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.016 ± 0.001 ms | |
| flashinfer : 0.015 ± 0.001 ms | |
| flash_attn : 0.024 ± 0.001 ms | |
| max : 0.172 ± 0.001 ms | |
| [26/88] Testing: seq_len=256, num_heads=16, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| All outputs match within tolerance | |
| Results: | |
| pytorch_sdpa : 0.016 ± 0.001 ms | |
| flashinfer : 0.015 ± 0.001 ms | |
| flash_attn : 0.029 ± 0.001 ms | |
| max : 0.324 ± 0.001 ms | |
| [27/88] Testing: seq_len=256, num_heads=32, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.016 ± 0.001 ms | |
| flashinfer : 0.015 ± 0.001 ms | |
| flash_attn : 0.024 ± 0.002 ms | |
| max : 0.336 ± 0.001 ms | |
| [28/88] Testing: seq_len=256, num_heads=32, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.016 ± 0.001 ms | |
| flashinfer : 0.015 ± 0.001 ms | |
| flash_attn : 0.030 ± 0.002 ms | |
| max : 0.649 ± 0.001 ms | |
| [29/88] Testing: seq_len=256, num_heads=64, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.016 ± 0.002 ms | |
| flashinfer : 0.015 ± 0.001 ms | |
| flash_attn : 0.024 ± 0.001 ms | |
| max : 0.671 ± 0.001 ms | |
| [30/88] Testing: seq_len=256, num_heads=64, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.016 ± 0.002 ms | |
| flashinfer : 0.016 ± 0.001 ms | |
| flash_attn : 0.026 ± 0.001 ms | |
| max : 1.315 ± 0.002 ms | |
| [31/88] Testing: seq_len=256, num_heads=128, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.018 ± 0.003 ms | |
| flashinfer : 0.015 ± 0.001 ms | |
| flash_attn : 0.026 ± 0.003 ms | |
| max : 1.337 ± 0.026 ms | |
| [32/88] Testing: seq_len=256, num_heads=128, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.019 ± 0.032 ms | |
| flashinfer : 0.023 ± 0.001 ms | |
| flash_attn : 0.026 ± 0.002 ms | |
| max : 2.622 ± 0.035 ms | |
| [33/88] Testing: seq_len=512, num_heads=16, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.016 ± 0.001 ms | |
| flashinfer : 0.017 ± 0.001 ms | |
| flash_attn : 0.030 ± 0.002 ms | |
| max : 0.338 ± 0.001 ms | |
| [34/88] Testing: seq_len=512, num_heads=16, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| All outputs match within tolerance | |
| Results: | |
| pytorch_sdpa : 0.017 ± 0.001 ms | |
| flashinfer : 0.020 ± 0.001 ms | |
| flash_attn : 0.030 ± 0.002 ms | |
| max : 0.651 ± 0.002 ms | |
| [35/88] Testing: seq_len=512, num_heads=32, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.016 ± 0.001 ms | |
| flashinfer : 0.019 ± 0.001 ms | |
| flash_attn : 0.026 ± 0.002 ms | |
| max : 0.673 ± 0.002 ms | |
| [36/88] Testing: seq_len=512, num_heads=32, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.017 ± 0.001 ms | |
| flashinfer : 0.029 ± 0.001 ms | |
| flash_attn : 0.026 ± 0.002 ms | |
| max : 1.318 ± 0.002 ms | |
| [37/88] Testing: seq_len=512, num_heads=64, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.016 ± 0.001 ms | |
| flashinfer : 0.021 ± 0.001 ms | |
| flash_attn : 0.026 ± 0.002 ms | |
| max : 1.341 ± 0.006 ms | |
| [38/88] Testing: seq_len=512, num_heads=64, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.018 ± 0.001 ms | |
| flashinfer : 0.036 ± 0.001 ms | |
| flash_attn : 0.034 ± 0.002 ms | |
| max : 2.642 ± 0.005 ms | |
| [39/88] Testing: seq_len=512, num_heads=128, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.021 ± 0.001 ms | |
| flashinfer : 0.035 ± 0.028 ms | |
| flash_attn : 0.033 ± 0.002 ms | |
| max : 2.672 ± 0.008 ms | |
| [40/88] Testing: seq_len=512, num_heads=128, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.030 ± 0.019 ms | |
| flashinfer : 0.053 ± 0.005 ms | |
| flash_attn : 0.051 ± 0.002 ms | |
| max : 5.274 ± 0.011 ms | |
| [41/88] Testing: seq_len=1024, num_heads=16, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| All outputs match within tolerance | |
| Results: | |
| pytorch_sdpa : 0.021 ± 0.001 ms | |
| flashinfer : 0.026 ± 0.001 ms | |
| flash_attn : 0.026 ± 0.003 ms | |
| max : 0.679 ± 0.002 ms | |
| [42/88] Testing: seq_len=1024, num_heads=16, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| All outputs match within tolerance | |
| Results: | |
| pytorch_sdpa : 0.023 ± 0.001 ms | |
| flashinfer : 0.041 ± 0.001 ms | |
| flash_attn : 0.036 ± 0.002 ms | |
| max : 1.324 ± 0.002 ms | |
| [43/88] Testing: seq_len=1024, num_heads=32, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.021 ± 0.001 ms | |
| flashinfer : 0.034 ± 0.001 ms | |
| flash_attn : 0.035 ± 0.002 ms | |
| max : 1.355 ± 0.004 ms | |
| [44/88] Testing: seq_len=1024, num_heads=32, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.024 ± 0.001 ms | |
| flashinfer : 0.060 ± 0.001 ms | |
| flash_attn : 0.058 ± 0.002 ms | |
| max : 2.656 ± 0.013 ms | |
| [45/88] Testing: seq_len=1024, num_heads=64, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.030 ± 0.001 ms | |
| flashinfer : 0.050 ± 0.002 ms | |
| flash_attn : 0.050 ± 0.002 ms | |
| max : 2.689 ± 0.006 ms | |
| [46/88] Testing: seq_len=1024, num_heads=64, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.040 ± 0.001 ms | |
| flashinfer : 0.088 ± 0.003 ms | |
| flash_attn : 0.084 ± 0.003 ms | |
| max : 5.302 ± 0.008 ms | |
| [47/88] Testing: seq_len=1024, num_heads=128, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.045 ± 0.001 ms | |
| flashinfer : 0.078 ± 0.002 ms | |
| flash_attn : 0.078 ± 0.002 ms | |
| max : 5.408 ± 0.020 ms | |
| [48/88] Testing: seq_len=1024, num_heads=128, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.063 ± 0.018 ms | |
| flashinfer : 0.137 ± 0.003 ms | |
| flash_attn : 0.135 ± 0.005 ms | |
| max : 13.151 ± 0.026 ms | |
| [49/88] Testing: seq_len=1536, num_heads=16, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.027 ± 0.002 ms | |
| flashinfer : 0.048 ± 0.001 ms | |
| flash_attn : 0.046 ± 0.002 ms | |
| max : 1.030 ± 0.004 ms | |
| [50/88] Testing: seq_len=1536, num_heads=16, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.029 ± 0.001 ms | |
| flashinfer : 0.084 ± 0.001 ms | |
| flash_attn : 0.081 ± 0.002 ms | |
| max : 2.001 ± 0.007 ms | |
| [51/88] Testing: seq_len=1536, num_heads=32, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.035 ± 0.002 ms | |
| flashinfer : 0.054 ± 0.002 ms | |
| flash_attn : 0.054 ± 0.003 ms | |
| max : 2.037 ± 0.004 ms | |
| [52/88] Testing: seq_len=1536, num_heads=32, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.039 ± 0.002 ms | |
| flashinfer : 0.096 ± 0.005 ms | |
| flash_attn : 0.092 ± 0.004 ms | |
| max : 3.987 ± 0.010 ms | |
| [53/88] Testing: seq_len=1536, num_heads=64, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.046 ± 0.002 ms | |
| flashinfer : 0.086 ± 0.002 ms | |
| flash_attn : 0.085 ± 0.002 ms | |
| max : 4.060 ± 0.011 ms | |
| [54/88] Testing: seq_len=1536, num_heads=64, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.065 ± 0.002 ms | |
| flashinfer : 0.155 ± 0.004 ms | |
| flash_attn : 0.150 ± 0.004 ms | |
| max : 8.797 ± 0.021 ms | |
| [55/88] Testing: seq_len=1536, num_heads=128, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.075 ± 0.002 ms | |
| flashinfer : 0.150 ± 0.021 ms | |
| flash_attn : 0.145 ± 0.003 ms | |
| max : 9.515 ± 0.018 ms | |
| [56/88] Testing: seq_len=1536, num_heads=128, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.101 ± 0.011 ms | |
| flashinfer : 0.263 ± 0.004 ms | |
| flash_attn : 0.253 ± 0.004 ms | |
| max : 21.508 ± 0.010 ms | |
| [57/88] Testing: seq_len=2048, num_heads=16, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.032 ± 0.001 ms | |
| flashinfer : 0.060 ± 0.001 ms | |
| flash_attn : 0.059 ± 0.002 ms | |
| max : 1.373 ± 0.004 ms | |
| [58/88] Testing: seq_len=2048, num_heads=16, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.036 ± 0.001 ms | |
| flashinfer : 0.109 ± 0.001 ms | |
| flash_attn : 0.103 ± 0.002 ms | |
| max : 2.665 ± 0.005 ms | |
| [59/88] Testing: seq_len=2048, num_heads=32, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.046 ± 0.001 ms | |
| flashinfer : 0.085 ± 0.003 ms | |
| flash_attn : 0.082 ± 0.003 ms | |
| max : 2.724 ± 0.006 ms | |
| [60/88] Testing: seq_len=2048, num_heads=32, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.051 ± 0.002 ms | |
| flashinfer : 0.148 ± 0.005 ms | |
| flash_attn : 0.144 ± 0.006 ms | |
| max : 5.335 ± 0.015 ms | |
| [61/88] Testing: seq_len=2048, num_heads=64, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.064 ± 0.002 ms | |
| flashinfer : 0.136 ± 0.003 ms | |
| flash_attn : 0.132 ± 0.003 ms | |
| max : 5.482 ± 0.014 ms | |
| [62/88] Testing: seq_len=2048, num_heads=64, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.093 ± 0.003 ms | |
| flashinfer : 0.245 ± 0.006 ms | |
| flash_attn : 0.235 ± 0.006 ms | |
| max : 13.294 ± 0.017 ms | |
| [63/88] Testing: seq_len=2048, num_heads=128, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.114 ± 0.002 ms | |
| flashinfer : 0.238 ± 0.003 ms | |
| flash_attn : 0.232 ± 0.003 ms | |
| max : 14.078 ± 0.026 ms | |
| [64/88] Testing: seq_len=2048, num_heads=128, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.152 ± 0.007 ms | |
| flashinfer : 0.427 ± 0.005 ms | |
| flash_attn : 0.411 ± 0.005 ms | |
| max : 29.816 ± 0.133 ms | |
| [65/88] Testing: seq_len=4096, num_heads=16, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.077 ± 0.001 ms | |
| flashinfer : 0.154 ± 0.009 ms | |
| flash_attn : 0.151 ± 0.006 ms | |
| max : 2.798 ± 0.006 ms | |
| [66/88] Testing: seq_len=4096, num_heads=16, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.084 ± 0.002 ms | |
| flashinfer : 0.286 ± 0.010 ms | |
| flash_attn : 0.266 ± 0.010 ms | |
| max : 5.403 ± 0.010 ms | |
| [67/88] Testing: seq_len=4096, num_heads=32, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| All outputs match within tolerance | |
| Results: | |
| pytorch_sdpa : 0.104 ± 0.002 ms | |
| flashinfer : 0.251 ± 0.006 ms | |
| flash_attn : 0.241 ± 0.005 ms | |
| max : 5.612 ± 0.009 ms | |
| [68/88] Testing: seq_len=4096, num_heads=32, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.120 ± 0.002 ms | |
| flashinfer : 0.458 ± 0.010 ms | |
| flash_attn : 0.436 ± 0.009 ms | |
| max : 13.537 ± 0.013 ms | |
| [69/88] Testing: seq_len=4096, num_heads=64, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.192 ± 0.002 ms | |
| flashinfer : 0.440 ± 0.005 ms | |
| flash_attn : 0.423 ± 0.006 ms | |
| max : 14.300 ± 0.024 ms | |
| [70/88] Testing: seq_len=4096, num_heads=64, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.261 ± 0.005 ms | |
| flashinfer : 0.806 ± 0.009 ms | |
| flash_attn : 0.767 ± 0.011 ms | |
| max : 30.013 ± 0.016 ms | |
| [71/88] Testing: seq_len=4096, num_heads=128, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.371 ± 0.001 ms | |
| flashinfer : 0.836 ± 0.020 ms | |
| flash_attn : 0.789 ± 0.005 ms | |
| max : 30.959 ± 0.015 ms | |
| [72/88] Testing: seq_len=4096, num_heads=128, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.479 ± 0.004 ms | |
| flashinfer : 1.590 ± 0.075 ms | |
| flash_attn : 1.428 ± 0.015 ms | |
| max : 60.491 ± 0.017 ms | |
| [73/88] Testing: seq_len=8192, num_heads=16, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.185 ± 0.002 ms | |
| flashinfer : 0.482 ± 0.011 ms | |
| flash_attn : 0.461 ± 0.011 ms | |
| max : 5.840 ± 0.028 ms | |
| [74/88] Testing: seq_len=8192, num_heads=16, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.215 ± 0.001 ms | |
| flashinfer : 0.919 ± 0.036 ms | |
| flash_attn : 0.845 ± 0.021 ms | |
| max : 13.872 ± 0.011 ms | |
| [75/88] Testing: seq_len=8192, num_heads=32, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.347 ± 0.002 ms | |
| flashinfer : 0.850 ± 0.011 ms | |
| flash_attn : 0.808 ± 0.012 ms | |
| max : 14.761 ± 0.013 ms | |
| [76/88] Testing: seq_len=8192, num_heads=32, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.432 ± 0.002 ms | |
| flashinfer : 1.655 ± 0.114 ms | |
| flash_attn : 1.478 ± 0.019 ms | |
| max : 30.551 ± 0.250 ms | |
| [77/88] Testing: seq_len=8192, num_heads=64, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.677 ± 0.002 ms | |
| flashinfer : 1.622 ± 0.049 ms | |
| flash_attn : 1.500 ± 0.011 ms | |
| max : 31.808 ± 0.009 ms | |
| [78/88] Testing: seq_len=8192, num_heads=64, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.923 ± 0.049 ms | |
| flashinfer : 3.053 ± 0.260 ms | |
| flash_attn : 2.747 ± 0.019 ms | |
| max : 61.206 ± 0.016 ms | |
| [79/88] Testing: seq_len=8192, num_heads=128, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 1.436 ± 0.099 ms | |
| flashinfer : 3.156 ± 0.185 ms | |
| flash_attn : 2.886 ± 0.010 ms | |
| max : 63.581 ± 0.013 ms | |
| [80/88] Testing: seq_len=8192, num_heads=128, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 1.805 ± 0.113 ms | |
| flashinfer : 5.778 ± 0.427 ms | |
| flash_attn : 5.291 ± 0.017 ms | |
| max : 122.288 ± 0.024 ms | |
| [81/88] Testing: seq_len=16384, num_heads=16, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| All outputs match within tolerance | |
| Results: | |
| pytorch_sdpa : 0.660 ± 0.002 ms | |
| flashinfer : 1.673 ± 0.024 ms | |
| flash_attn : 1.582 ± 0.018 ms | |
| max : 15.658 ± 0.025 ms | |
| [82/88] Testing: seq_len=16384, num_heads=16, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 0.820 ± 0.023 ms | |
| flashinfer : 3.388 ± 0.445 ms | |
| flash_attn : 2.903 ± 0.036 ms | |
| max : 31.268 ± 0.008 ms | |
| [83/88] Testing: seq_len=16384, num_heads=32, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 1.321 ± 0.036 ms | |
| flashinfer : 3.140 ± 0.076 ms | |
| flash_attn : 2.940 ± 0.021 ms | |
| max : 33.547 ± 0.019 ms | |
| [84/88] Testing: seq_len=16384, num_heads=32, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 1.769 ± 0.093 ms | |
| flashinfer : 5.928 ± 0.457 ms | |
| flash_attn : 5.400 ± 0.030 ms | |
| max : 62.747 ± 0.012 ms | |
| [85/88] Testing: seq_len=16384, num_heads=64, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 2.787 ± 0.102 ms | |
| flashinfer : 6.016 ± 0.086 ms | |
| flash_attn : 5.658 ± 0.020 ms | |
| max : 66.786 ± 0.024 ms | |
| [86/88] Testing: seq_len=16384, num_heads=64, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 3.593 ± 0.112 ms | |
| flashinfer : 11.100 ± 0.046 ms | |
| flash_attn : 10.406 ± 0.038 ms | |
| max : 124.996 ± 0.025 ms | |
| [87/88] Testing: seq_len=16384, num_heads=128, head_dim=64 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 5.960 ± 0.187 ms | |
| flashinfer : 11.750 ± 0.096 ms | |
| flash_attn : 11.091 ± 0.021 ms | |
| max : 133.292 ± 0.240 ms | |
| [88/88] Testing: seq_len=16384, num_heads=128, head_dim=128 | |
| - Running PyTorch SDPA w/ CUDNN attention... | |
| - Running FlashInfer... | |
| - Running Flash Attention... | |
| - Running Max... | |
| Reference: pytorch_sdpa | |
| Validation failed: pytorch_sdpa vs flashinfer, max_diff=0.015625 | |
| Results: | |
| pytorch_sdpa : 7.280 ± 0.272 ms | |
| flashinfer : 21.756 ± 0.031 ms | |
| flash_attn : 20.401 ± 0.032 ms | |
| max : 249.674 ± 0.035 ms | |
| Detailed results saved to: ./flash_benchmark_results_20250905_020930.csv | |
| Comparison table saved to: ./flash_benchmark_comparison_20250905_020930.csv | |
| ================================================================================ | |
| BENCHMARK SUMMARY | |
| ================================================================================ | |
| Average Latencies by Implementation: | |
| implementation | |
| flash_attn 0.964 | |
| flashinfer 1.031 | |
| max 16.071 | |
| pytorch_sdpa 0.384 | |
| Name: mean_latency_ms, dtype: float64 | |
| ================================================================================ | |
| SPEEDUP SUMMARY (relative to PyTorch SDPA) | |
| ================================================================================ | |
| flashinfer: | |
| Average speedup: 0.660x | |
| Median speedup: 0.562x | |
| Max speedup: 1.197x | |
| Min speedup: 0.234x | |
| flash_attn: | |
| Average speedup: 0.509x | |
| Median speedup: 0.537x | |
| Max speedup: 0.810x | |
| Min speedup: 0.254x | |
| max: | |
| Average speedup: 0.047x | |
| Median speedup: 0.022x | |
| Max speedup: 0.431x | |
| Min speedup: 0.005x |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment