airMeng/FP8 workflow.md

Last active January 23, 2025 06:07

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/airMeng/bd4207fc7e4e4a5bdc2472dee726bb3c.js"></script>
Save airMeng/bd4207fc7e4e4a5bdc2472dee726bb3c to your computer and use it in GitHub Desktop.

Raw

FP8 Linear in TorchAO

%%{init: {'themeVariables': {'fontSize': '24px', 'fontFamily': 'Arial'}}}%%
graph TD
    E{Scaling Recipes}
    E -->|Delayed| F[Record History and Register Scale Buffer]
    E -->|Static| G[Register Scale Buffer]
    E -->|Dynamic| A
    F --> A
    G --> A

    A[Check HW Support] --> NV

    subgraph NV[NVIDIA Path]
        B{NV Hopper+?}
        B -->|Yes| C[Dequantize + GEMM]
        B -->|No| D[Native FP8 GEMM]
    end

    NV --> H[Triton for pre and post Linear]

    A -->|Us| OneDNN[OneDNN Implementation]
    OneDNN --> H