Skip to content

Instantly share code, notes, and snippets.

@airMeng
Last active January 23, 2025 06:07
Show Gist options
  • Save airMeng/bd4207fc7e4e4a5bdc2472dee726bb3c to your computer and use it in GitHub Desktop.
Save airMeng/bd4207fc7e4e4a5bdc2472dee726bb3c to your computer and use it in GitHub Desktop.

FP8 Linear in TorchAO

%%{init: {'themeVariables': {'fontSize': '24px', 'fontFamily': 'Arial'}}}%%
graph TD
    E{Scaling Recipes}
    E -->|Delayed| F[Record History and Register Scale Buffer]
    E -->|Static| G[Register Scale Buffer]
    E -->|Dynamic| A
    F --> A
    G --> A

    A[Check HW Support] --> NV

    subgraph NV[NVIDIA Path]
        B{NV Hopper+?}
        B -->|Yes| C[Dequantize + GEMM]
        B -->|No| D[Native FP8 GEMM]
    end

    NV --> H[Triton for pre and post Linear]

    A -->|Us| OneDNN[OneDNN Implementation]
    OneDNN --> H

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment