FP8 Linear in TorchAO
%%{init: {'themeVariables': {'fontSize': '24px', 'fontFamily': 'Arial'}}}%%
graph TD
E{Scaling Recipes}
E -->|Delayed| F[Record History and Register Scale Buffer]
E -->|Static| G[Register Scale Buffer]
E -->|Dynamic| A
F --> A
G --> A
A[Check HW Support] --> NV
subgraph NV[NVIDIA Path]
B{NV Hopper+?}
B -->|Yes| C[Dequantize + GEMM]
B -->|No| D[Native FP8 GEMM]
end
NV --> H[Triton for pre and post Linear]
A -->|Us| OneDNN[OneDNN Implementation]
OneDNN --> H