In order to offload to GPU a model must fit in VRAM.
The following table lists models assuming Q4_K_M quantization. You can always use a model that fits in a smaller VRAM size.
| VRAM | Models |
|---|---|
| 384GB | - DeepSeek V3 671 |
| 128GB | - Mistral Large 2411 123b |
| 64GB | - Qwen2.5 72b |