Previously, we supported the loading and inference of AutoRound-quantized models in SGlang. In 2026 Q1, we plan to make more contributions to SGLang to enrich its model quantization methods powered by Intel® Neural Compressor and AutoRound.
- Consolidate Intel quantization support with more quantization methods
- Enable various formats (FP8/WNA16/MXFP4/MXFP8/NVFP4, and advanced mixed-precision recipes) with good accuracy, which benefit from the advanced algorithm AutoRound of Neural Compressor
- Provide users seamless access to state-of-the-art quantization that balances speed, accuracy, and ease of use