In order to offload to GPU a model must fit in VRAM.
The following table lists models assuming Q4_K_M quantization. You can always use a model that fits in a smaller VRAM size.
VRAM | Models |
---|---|
384GB | - DeepSeek V3 671 |
128GB | - Mistral Large 2411 123b |
64GB | - Qwen2.5 72b |