AMD Ryzen AI MAX+ 395 — LLM Benchmark Results
Hardware : Ryzen AI MAX+ 395 / Radeon 8060S (gfx1151) / 96 GiB VRAM (LPDDR5X-8000, 256 GB/s)
Host : Proxmox 9.x
Last updated: 2026-04-13
Recommended Model Assignments
Purpose
Model
Server
Notes
Embeddings
qwen3-embedding:8b (4096 dims)
Lemonade
MTEB 70.6, code-aware, 40K context
Extraction LLM
Qwen3.5 35B (MoE)
Lemonade
55.4 tok/s, fast + accurate structured output
Code generation
Qwen3-Coder 30B (MoE)
Lemonade
77.3 tok/s, 256K context
Deep code review
Qwen2.5-Coder 32B
Lemonade
11.1 tok/s, dense reasoning
Business/strategy
Qwen3.5 35B (MoE)
Lemonade
55.4 tok/s, thinking mode
Deep reasoning
Qwen3.5 122B (MoE)
Lemonade
21.7 tok/s, strongest model on hardware
Quick tasks
Gemma 4 E2B
Lemonade
102.9 tok/s
Memory retrieval
nomic-embed-text
Ollama (104)
Existing mem0 setup
Lemonade Server (v10.2.0) — LXC 114 (Ubuntu 24.04, ROCm gfx1151-specific binary)
Model
Type
Size
Speed
Notes
Gemma 4 E2B
Dense 2B
3.3 GB
102.9 tok/s
Fastest model, multimodal
Qwen3-Coder 30B
MoE 3B active
20 GB
77.3 tok/s
Best fast coder
Gemma 4 E4B
Dense 4B
5.4 GB
55.5 tok/s
Quick tasks + vision
Qwen3.5 35B
MoE 3B active
21 GB
55.4 tok/s
Best general MoE
Gemma 4 26B MoE
MoE 4B active
18 GB
49.5 tok/s
Multimodal MoE
DeepSeek-R1-0528-Qwen3-8B
Dense 8B
5.3 GB
41.7 tok/s
Fast reasoning (R1 distill)
Qwen3 8B
Dense 8B
5.6 GB
40.3 tok/s
Quick reasoning
Qwen3.5 122B
MoE 10B active
73 GB
21.7 tok/s
Strongest reasoning
Llama 4 Scout
MoE 109B
66 GB
19.3 tok/s
Multimodal, large context
Devstral Small 2
Dense 24B
15 GB
15.1 tok/s
Agentic SWE
Qwen2.5-Coder 32B
Dense 32B
21 GB
11.1 tok/s
Deep code review
Gemma 4 31B
Dense 31B
20 GB
10.9 tok/s
General (crashes on Ollama)
Model
Dims
MTEB
Size
Latency (warm)
qwen3-embedding 0.6b (Q8_0)
1024
~60
610 MB
~75ms
qwen3-embedding 8b (Q4_K_M)
4096
70.6
4.7 GB
~300ms
Ollama (v0.x) — LXC 104 (Ubuntu 24.04, ROCm via system install)
Model
Type
Size
Speed
Notes
Gemma 4 E2B
Dense 2B
7.2 GB
80.7 tok/s
Qwen3.5 35B
MoE 3B active
24 GB
41.7 tok/s
Qwen3 8B
Dense 8B
5.2 GB
38.5 tok/s
Qwen3.5 9B
Dense 9B
6.6 GB
31.6 tok/s
phi4-reasoning:plus
Dense 14B
11 GB
19.0 tok/s
Qwen2.5-Coder 32B
Dense 32B
20 GB
11.0 tok/s
Qwen3 32B
Dense 32B
20 GB
10.2 tok/s
Gemma 4 E2B (Vulkan)
Dense 2B
7.2 GB
45.1 tok/s
Vulkan backend on LXC 105
Ollama — Crashes (INT_MAX tensor bug in llama.cpp ROCm)
Model
Size
Error
Gemma 4 31B (Q4_K_M)
20 GB
GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX)
Gemma 4 31B (Q8_0)
34 GB
Same
qwen3-coder:30b
19 GB
Same
devstral-small-2:24b
15 GB
Same
Lemonade vs Ollama — Same Model Comparison
Model
Lemonade
Ollama
Delta
Gemma 4 E2B
102.9 tok/s
80.7 tok/s
+27% Lemonade
Qwen3 8B
37.6 tok/s
38.5 tok/s
~same
Qwen2.5-Coder 32B
11.1 tok/s
11.0 tok/s
~same
Qwen3.5 35B
55.4 tok/s
41.7 tok/s
+33% Lemonade
Lemonade's gfx1151-specific ROCm binary is significantly faster for some models and also avoids the INT_MAX crash that affects Ollama.
Prompt: "Explain quantum entanglement in two sentences. /no_think"
Non-streaming, single request, exclusive GPU access (other instances stopped)
Speed = eval_count / (eval_duration / 1e9) from Ollama-compatible API
Load times excluded (models pre-warmed where load shows 0.0s)
Embedding latency measured via curl wall-clock time, 5 runs, first run excluded (cold load)