In order to offload to GPU a model must fit in VRAM.
The following table lists models assuming Q4_K_M quantization. You can always use a model that fits in a smaller VRAM size.
| VRAM | Models | 
|---|---|
| 384GB | - DeepSeek V3 671 | 
| 128GB | - Mistral Large 2411 123b | 
| 64GB | - Qwen2.5 72b | 
