Skip to content

Instantly share code, notes, and snippets.

View airMeng's full-sized avatar
🇸🇭
I will not serve

Meng, Hengyu airMeng

🇸🇭
I will not serve
View GitHub Profile
@airMeng
airMeng / part_3_vectorization_techniques.md
Created December 26, 2023 07:16 — forked from mingfeima/part_3_vectorization_techniques.md
PyTorch CPU Performance Optimization Tutorial - Section III
@airMeng
airMeng / LLM Int4 Inference on Arc.md
Last active January 31, 2024 05:56
LLM Int4 Inference on Arc

IPEX

Intel® Extension for PyTorch(IPEX) extends PyTorch* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of Intel Xe Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs through the PyTorch* xpu device.

XeTLA

Intel® Xe Templates for Linear Algebra (Intel® XeTLA) is a collection of SYCL/ESIMD templates that enable high-performance General Matrix Multiply (GEMM), Convolution (CONV), and related computations on Intel Xe GPU architecture. Intel® XeTLA offers reusable C++ templates for kernel, group and subgroup levels, allowing developers to optimize and specialize kernels based on data types, tiling policies, algorithms, fusion policies, and more.

Users can easily define new compression/de-compression prologue and insert right between BRGEMM to fully accelerate WOQ GEMM due to XeTLA's template designs.

@airMeng
airMeng / XeTLA.md
Last active June 29, 2024 02:25
XeTLA

HW Target

Device PVC MTL DG2 LNL/BMG(TODO) ARL(TODO)
ISA Xe Xe-lpg Xe-hpg Xe2 Xe-lpg+
DPAS 8,8,16 NA 8,8,8 8,4,16 8,8,8
2D Block 32, 64 NA NA 32, 64 NA
1D Block 64 32 32 64 32

How to Add a new HW

FP8 Linear in TorchAO

%%{init: {'themeVariables': {'fontSize': '24px', 'fontFamily': 'Arial'}}}%%
graph TD
    E{Scaling Recipes}
    E -->|Delayed| F[Record History and Register Scale Buffer]
    E -->|Static| G[Register Scale Buffer]
    E -->|Dynamic| A
 F --> A
@airMeng
airMeng / static.md
Last active April 17, 2025 03:09
Static quantization in AO