Skip to content

Instantly share code, notes, and snippets.

@rweald
Last active June 24, 2026 21:42
Show Gist options
  • Select an option

  • Save rweald/a3edc1d0dbb6ba0b951f39e57c8df73d to your computer and use it in GitHub Desktop.

Select an option

Save rweald/a3edc1d0dbb6ba0b951f39e57c8df73d to your computer and use it in GitHub Desktop.
Basic setup for running Krea 2 Turbo on DGX Spark via Docker

Krea 2 Turbo on DGX Spark

Local text-to-image: Krea 2 Turbo behind a minimal OpenAI-compatible endpoint (diffusers on NVIDIA's NGC PyTorch base), with a built-in single-page web UI.

What's here

  • Dockerfilenvcr.io/nvidia/pytorch:26.05-py3 (arm64/Blackwell) + diffusers.
  • server.py — FastAPI server: /v1/images/generations, /v1/models, and the UI at /.
  • index.html — self-contained prompt-and-preview UI.
  • docker-compose.yml — the single krea service, host network.

Setup

  1. (Recommended) Smoke-test the build before wiring up the UI. This confirms diffusers actually has the Krea 2 pipeline, that the GPU is reachable, and pre-downloads the weights:

    docker compose build krea
    docker compose run --rm krea python -c \
      "import diffusers; print(diffusers.__version__); \
       import torch; from diffusers import DiffusionPipeline; \
       p=DiffusionPipeline.from_pretrained('krea/Krea-2-Turbo', torch_dtype=torch.bfloat16).to('cuda'); \
       print(p.__class__.__name__)"

    Expect a diffusers version, then Krea2Pipeline, with no CUDA error on .to('cuda'). If Krea2Pipeline import fails, the released build is lagging the model (common for a day or two after release) — pin a newer diffusers release in the Dockerfile.

  2. Bring it up:

    docker compose up -d

    First boot downloads ~24 GB of weights (600s healthcheck start_period covers it). Then open http://<spark-ip>:30000, enter a prompt, and generate.

Notes

  • Base image: 26.05-py3 has a verified arm64 manifest. Bump the tag when a newer CUDA-13 arm64 build ships on catalog.ngc.nvidia.com.
  • Memory: ~24 GB bf16 (peak ~37 GB). The Spark's 128 GB unified memory keeps everything resident — no offload flags.
  • GPU access: NVIDIA Container Toolkit via deploy.resources, not privileged.
  • UI: a thin client — POSTs to /v1/images/generations and renders the result. No build step, no auth.
  • Turbo vs Raw: 8 steps is hardcoded for Turbo. Raw needs ~52 steps + guidance; adjust num_inference_steps if you switch models.
  • Scaling up: server.py is single-image with no batching. For real throughput, SGLang wraps the same pipeline behind the same endpoint (needs a 0.5.13+ arm64 image) — the UI wouldn't change.

Free up disk space

The ~35 GB of weights live in the hf-cache volume (<project>_hf-cache — see docker volume ls), not the container. Deleting them is safe; the next start re-pulls them. Stop first with docker compose down, then:

# Just the Krea 2 model:
docker run --rm -v krea2-on-dgx-spark_hf-cache:/cache alpine \
  rm -rf /cache/hub/models--krea--Krea-2-Turbo

# Or the whole cache:
docker volume rm krea2-on-dgx-spark_hf-cache

docker compose down -v also wipes hf-cache — avoid it unless you mean to.

services:
krea:
build: .
network_mode: host
ipc: host
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
- hf-cache:/root/.cache/huggingface
environment:
- HF_HOME=/root/.cache/huggingface
healthcheck:
# First run downloads ~24 GB of weights before answering — long start_period.
test: ["CMD", "curl", "-fsS", "http://localhost:30000/v1/models"]
interval: 30s
timeout: 10s
retries: 10
start_period: 600s
restart: unless-stopped
volumes:
hf-cache:
# NVIDIA NGC PyTorch base — arm64/Blackwell (Grace GB10) ready, CUDA 13.
# 26.05-py3 ships an arm64 manifest (verified). Bump to the current tag if a
# newer one is published on catalog.ngc.nvidia.com.
FROM nvcr.io/nvidia/pytorch:26.05-py3
# NOTE: released diffusers 0.38.0 (shipped in this NGC image) does NOT yet have
# Krea2Pipeline — confirmed at build time. Install diffusers from git main, which
# has it. Revert to the released package (`diffusers`) once a PyPI release > 0.38.0
# ships the Krea 2 pipeline.
RUN pip install --no-cache-dir -U \
"git+https://github.com/huggingface/diffusers.git" \
transformers accelerate \
fastapi "uvicorn[standard]"
WORKDIR /app
COPY server.py /app/server.py
COPY index.html /app/index.html
EXPOSE 30000
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "30000"]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Krea 2 Turbo</title>
<style>
:root {
--bg: #0a0a0c;
--panel: #141418;
--panel-2: #1c1c22;
--border: #2a2a32;
--text: #ececf1;
--muted: #8b8b96;
--accent: #7c5cff;
--accent-2: #5cc8ff;
--danger: #ff6b6b;
--radius: 16px;
}
* { box-sizing: border-box; }
html, body { height: 100%; }
body {
margin: 0;
font-family: "Inter", -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif;
color: var(--text);
background: radial-gradient(1200px 700px at 50% -10%, #1a1530 0%, var(--bg) 55%) fixed;
-webkit-font-smoothing: antialiased;
display: flex;
flex-direction: column;
height: 100%;
}
header {
display: flex;
align-items: center;
gap: 12px;
padding: 16px 28px;
border-bottom: 1px solid var(--border);
flex: none;
}
.logo {
width: 30px; height: 30px; border-radius: 9px;
background: linear-gradient(135deg, var(--accent), var(--accent-2));
box-shadow: 0 0 24px -4px var(--accent);
}
.brand { font-weight: 650; font-size: 16px; letter-spacing: -0.01em; }
.header-actions { margin-left: auto; display: flex; align-items: center; gap: 12px; }
.badge {
font-size: 12px; color: var(--muted);
border: 1px solid var(--border);
padding: 5px 10px; border-radius: 999px;
background: var(--panel);
}
.badge .dot {
display: inline-block; width: 7px; height: 7px; border-radius: 50%;
background: #3ddc84; margin-right: 6px; vertical-align: middle;
box-shadow: 0 0 8px #3ddc84;
}
.clear-btn {
font: inherit; font-size: 12px; color: var(--muted);
border: 1px solid var(--border); border-radius: 999px;
background: var(--panel); padding: 5px 12px; cursor: pointer;
transition: color .15s, border-color .15s;
}
.clear-btn:hover { color: var(--danger); border-color: var(--danger); }
.clear-btn:disabled { opacity: .4; cursor: not-allowed; }
/* Feed */
.feed-wrap { flex: 1; overflow-y: auto; }
.feed {
width: 100%;
max-width: 880px;
margin: 0 auto;
padding: 24px 24px 8px;
display: flex;
flex-direction: column;
gap: 20px;
}
.empty {
text-align: center; color: var(--muted);
margin: auto; padding: 64px 24px;
}
.empty svg { opacity: .5; margin-bottom: 12px; }
.empty p { margin: 0; font-size: 14px; }
.entry { animation: fade .4s ease; }
@keyframes fade { from { opacity: 0; transform: translateY(6px); } to { opacity: 1; transform: none; } }
.entry .prompt {
font-size: 15px; line-height: 1.5; color: var(--text);
margin-bottom: 8px; white-space: pre-wrap; word-break: break-word;
}
.entry .meta {
font-size: 12px; color: var(--muted);
display: flex; align-items: center; gap: 10px; flex-wrap: wrap; margin-bottom: 10px;
}
.entry .meta .timing {
display: inline-flex; align-items: center; gap: 5px;
color: var(--accent-2);
background: rgba(92, 200, 255, .10);
border: 1px solid rgba(92, 200, 255, .28);
padding: 2px 9px; border-radius: 999px;
font-weight: 600; font-variant-numeric: tabular-nums;
}
.entry .meta .timing svg { width: 12px; height: 12px; }
.frame {
position: relative;
border-radius: var(--radius);
border: 1px solid var(--border);
background: var(--panel);
overflow: hidden;
aspect-ratio: var(--ar, 1 / 1);
max-height: 70vh;
display: flex; align-items: center; justify-content: center;
}
.frame img { width: 100%; height: 100%; object-fit: contain; display: block; }
.shimmer { position: absolute; inset: 0; }
.shimmer::after {
content: ""; position: absolute; inset: 0;
background: linear-gradient(110deg, transparent 30%, rgba(255,255,255,.05) 50%, transparent 70%);
background-size: 200% 100%;
animation: slide 1.4s infinite;
}
@keyframes slide { from { background-position: 200% 0; } to { background-position: -200% 0; } }
.frame .status {
position: absolute; bottom: 12px; left: 50%; transform: translateX(-50%);
font-size: 12px; color: var(--text);
background: rgba(0,0,0,.55); backdrop-filter: blur(8px);
border: 1px solid var(--border);
padding: 6px 12px; border-radius: 999px;
}
.frame .err {
color: var(--danger); font-size: 13px; padding: 24px; text-align: center;
}
.frame .download {
position: absolute; top: 12px; right: 12px;
width: 36px; height: 36px; border-radius: 10px;
border: 1px solid var(--border);
background: rgba(0,0,0,.5); backdrop-filter: blur(8px);
color: var(--text); cursor: pointer;
display: flex; align-items: center; justify-content: center;
opacity: 0; transition: opacity .15s, background .15s, border-color .15s;
}
.frame:hover .download { opacity: 1; }
.frame .download:hover { background: var(--panel-2); border-color: var(--accent); }
/* Composer */
.composer-wrap { flex: none; padding: 12px 24px 20px; }
.composer {
width: 100%; max-width: 880px; margin: 0 auto;
background: var(--panel);
border: 1px solid var(--border);
border-radius: var(--radius);
padding: 14px;
box-shadow: 0 16px 48px -24px rgba(0,0,0,.8);
}
textarea {
width: 100%;
border: none; outline: none; resize: none;
background: transparent; color: var(--text);
font: inherit; font-size: 15px; line-height: 1.5;
min-height: 48px;
}
textarea::placeholder { color: var(--muted); }
.controls {
display: flex; align-items: center; gap: 10px;
margin-top: 8px; padding-top: 12px;
border-top: 1px solid var(--border);
}
.select-wrap { position: relative; }
select {
appearance: none; -webkit-appearance: none;
background: var(--panel-2); color: var(--text);
border: 1px solid var(--border); border-radius: 10px;
padding: 9px 30px 9px 12px; font: inherit; font-size: 13px;
cursor: pointer; outline: none;
}
select:hover { border-color: var(--accent); }
.select-wrap::after {
content: "▾"; position: absolute; right: 11px; top: 50%;
transform: translateY(-50%); color: var(--muted); pointer-events: none; font-size: 11px;
}
.hint { font-size: 12px; color: var(--muted); margin-left: 2px; }
.generate {
margin-left: auto;
border: none; border-radius: 10px;
padding: 10px 20px; font: inherit; font-weight: 600; font-size: 14px;
color: white; cursor: pointer;
background: linear-gradient(135deg, var(--accent), #9a7bff);
box-shadow: 0 6px 20px -6px var(--accent);
transition: transform .08s, filter .15s, opacity .15s;
display: inline-flex; align-items: center; gap: 8px;
}
.generate:hover { filter: brightness(1.08); }
.generate:active { transform: translateY(1px); }
.generate:disabled { opacity: .55; cursor: not-allowed; filter: none; }
.spinner {
width: 14px; height: 14px; border-radius: 50%;
border: 2px solid rgba(255,255,255,.35); border-top-color: white;
animation: spin .7s linear infinite; display: none;
}
.busy .spinner { display: inline-block; }
@keyframes spin { to { transform: rotate(360deg); } }
</style>
</head>
<body>
<header>
<div class="logo"></div>
<div class="brand">Krea 2 Turbo</div>
<div class="header-actions">
<button class="clear-btn" id="clearBtn" disabled>Clear history</button>
<div class="badge" id="badge"><span class="dot"></span>checking…</div>
</div>
</header>
<div class="feed-wrap" id="feedWrap">
<div class="feed" id="feed">
<div class="empty" id="empty">
<svg width="40" height="40" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.5">
<rect x="3" y="3" width="18" height="18" rx="3"/>
<circle cx="9" cy="9" r="2"/>
<path d="M21 15l-5-5L5 21"/>
</svg>
<p>Your generated images will appear here</p>
</div>
</div>
</div>
<div class="composer-wrap">
<form class="composer" id="form">
<textarea id="prompt" placeholder="Describe the image you want to create… (⌘/Ctrl + Enter to generate)" autofocus></textarea>
<div class="controls">
<div class="select-wrap">
<select id="size">
<option value="1024x1024">Square · 1024×1024</option>
<option value="1280x768">Landscape · 1280×768</option>
<option value="768x1280">Portrait · 768×1280</option>
</select>
</div>
<span class="hint">8-step turbo</span>
<button class="generate" id="generate" type="submit">
<span class="spinner"></span>
<span class="label">Generate</span>
</button>
</div>
</form>
</div>
<script>
const form = document.getElementById("form");
const promptEl = document.getElementById("prompt");
const sizeEl = document.getElementById("size");
const genBtn = document.getElementById("generate");
const label = genBtn.querySelector(".label");
const feedWrap = document.getElementById("feedWrap");
const feed = document.getElementById("feed");
const emptyEl = document.getElementById("empty");
const badge = document.getElementById("badge");
const clearBtn = document.getElementById("clearBtn");
// ---- IndexedDB persistence (full-res PNGs are too large for localStorage) ----
const DB_NAME = "krea-history", STORE = "entries";
function openDB() {
return new Promise((res, rej) => {
const r = indexedDB.open(DB_NAME, 1);
r.onupgradeneeded = () => {
const db = r.result;
if (!db.objectStoreNames.contains(STORE))
db.createObjectStore(STORE, { keyPath: "id", autoIncrement: true });
};
r.onsuccess = () => res(r.result);
r.onerror = () => rej(r.error);
});
}
async function dbAll() {
const db = await openDB();
return new Promise((res, rej) => {
const req = db.transaction(STORE, "readonly").objectStore(STORE).getAll();
req.onsuccess = () => res(req.result);
req.onerror = () => rej(req.error);
});
}
async function dbAdd(entry) {
const db = await openDB();
return new Promise((res, rej) => {
const req = db.transaction(STORE, "readwrite").objectStore(STORE).add(entry);
req.onsuccess = () => res(req.result);
req.onerror = () => rej(req.error);
});
}
async function dbClear() {
const db = await openDB();
return new Promise((res, rej) => {
const tx = db.transaction(STORE, "readwrite");
tx.objectStore(STORE).clear();
tx.oncomplete = () => res();
tx.onerror = () => rej(tx.error);
});
}
function aspectRatio(size) {
const m = /^(\d+)x(\d+)$/.exec(size || "");
return m ? `${m[1]} / ${m[2]}` : "1 / 1";
}
function fmtTime(ts) {
return new Date(ts).toLocaleString([], { month: "short", day: "numeric", hour: "2-digit", minute: "2-digit" });
}
function refreshEmpty() {
const hasEntries = feed.querySelector(".entry");
emptyEl.style.display = hasEntries ? "none" : "";
clearBtn.disabled = !hasEntries;
}
function scrollToBottom() { feedWrap.scrollTop = feedWrap.scrollHeight; }
// Render a finished entry. `src` is a data URL (or null while pending).
function renderEntry({ prompt, size, created, ms, src }) {
const el = document.createElement("div");
el.className = "entry";
el.innerHTML = `
<div class="prompt"></div>
<div class="meta">
<span class="meta-size"></span>
<span class="meta-time"></span>
<span class="timing" hidden>
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M13 2L3 14h7l-1 8 10-12h-7z"/></svg>
<span class="timing-val"></span>
</span>
</div>
<div class="frame" style="--ar:${aspectRatio(size)}">
<img alt="" />
<button class="download" title="Download">
<svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M12 3v12m0 0l-4-4m4 4l4-4M5 21h14"/></svg>
</button>
</div>`;
el.querySelector(".prompt").textContent = prompt;
el.querySelector(".meta-size").textContent = size;
el.querySelector(".meta-time").textContent = fmtTime(created);
if (ms != null) {
const t = el.querySelector(".timing");
t.hidden = false;
t.title = "Generation time";
t.querySelector(".timing-val").textContent = (ms / 1000).toFixed(2) + "s";
}
const img = el.querySelector("img");
img.src = src;
el.querySelector(".download").addEventListener("click", () => {
const a = document.createElement("a");
a.href = src;
a.download = "krea-" + created + ".png";
a.click();
});
return el;
}
// Health badge
fetch("/v1/models").then(r => r.ok ? r.json() : Promise.reject())
.then(d => { badge.innerHTML = '<span class="dot"></span>' + (d.data?.[0]?.id || "ready"); })
.catch(() => { badge.innerHTML = '<span class="dot" style="background:#ff6b6b;box-shadow:0 0 8px #ff6b6b"></span>offline'; });
// Load persisted history (oldest first, newest at the bottom)
dbAll().then(entries => {
entries.sort((a, b) => a.created - b.created);
for (const e of entries)
feed.appendChild(renderEntry({ ...e, src: "data:image/png;base64," + e.b64 }));
refreshEmpty();
scrollToBottom();
}).catch(() => {});
clearBtn.addEventListener("click", async () => {
if (!feed.querySelector(".entry")) return;
if (!confirm("Delete all generated images from this browser?")) return;
await dbClear();
feed.querySelectorAll(".entry").forEach(n => n.remove());
refreshEmpty();
});
async function generate() {
const prompt = promptEl.value.trim();
if (!prompt) { promptEl.focus(); return; }
const size = sizeEl.value;
const created = Date.now();
genBtn.disabled = true;
genBtn.classList.add("busy");
label.textContent = "Generating";
// Pending card with shimmer
const card = document.createElement("div");
card.className = "entry";
card.innerHTML = `
<div class="prompt"></div>
<div class="meta"><span></span><span></span></div>
<div class="frame" style="--ar:${aspectRatio(size)}">
<div class="shimmer"></div>
<div class="status">Generating…</div>
</div>`;
card.querySelector(".prompt").textContent = prompt;
const metaSpans = card.querySelectorAll(".meta span");
metaSpans[0].textContent = size;
metaSpans[1].textContent = fmtTime(created);
const statusEl = card.querySelector(".status");
feed.appendChild(card);
refreshEmpty();
scrollToBottom();
const started = performance.now();
const ticker = setInterval(() => {
statusEl.textContent = "Generating… " + ((performance.now() - started) / 1000).toFixed(1) + "s";
}, 100);
try {
const res = await fetch("/v1/images/generations", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ prompt, size, response_format: "b64_json" }),
});
if (!res.ok) throw new Error("Server returned " + res.status);
const json = await res.json();
const b64 = json.data?.[0]?.b64_json;
if (!b64) throw new Error("No image in response");
clearInterval(ticker);
const ms = performance.now() - started;
const entry = { prompt, size, created, ms, b64 };
const fresh = renderEntry({ ...entry, src: "data:image/png;base64," + b64 });
card.replaceWith(fresh);
scrollToBottom();
promptEl.value = "";
dbAdd(entry).catch(() => {/* history is best-effort; image is already shown */});
} catch (err) {
clearInterval(ticker);
card.querySelector(".frame").innerHTML =
'<div class="err">' + (err.message || "Generation failed") + "</div>";
} finally {
genBtn.disabled = false;
genBtn.classList.remove("busy");
label.textContent = "Generate";
}
}
form.addEventListener("submit", (e) => { e.preventDefault(); generate(); });
promptEl.addEventListener("keydown", (e) => {
if ((e.metaKey || e.ctrlKey) && e.key === "Enter") { e.preventDefault(); generate(); }
});
</script>
</body>
</html>
import base64
import io
import os
import torch
from diffusers import DiffusionPipeline
from fastapi import FastAPI
from fastapi.responses import FileResponse
from pydantic import BaseModel
app = FastAPI()
INDEX_HTML = os.path.join(os.path.dirname(__file__), "index.html")
# Loads krea/Krea-2-Turbo via the auto-class, which resolves to Krea2Pipeline
# from the repo's model_index.json. ~24 GB DiT in bf16 + Qwen3-VL encoder + VAE.
# On the Spark's 128 GB unified memory everything stays resident (no offload).
pipe = DiffusionPipeline.from_pretrained(
"krea/Krea-2-Turbo",
torch_dtype=torch.bfloat16,
).to("cuda")
class ImageRequest(BaseModel):
prompt: str
n: int = 1
size: str | None = "1024x1024"
model: str | None = None
response_format: str | None = "b64_json"
@app.get("/")
def index():
return FileResponse(INDEX_HTML, media_type="text/html")
@app.get("/v1/models")
def list_models():
return {"object": "list", "data": [{"id": "krea/Krea-2-Turbo", "object": "model"}]}
@app.post("/v1/images/generations")
def generate(req: ImageRequest):
width, height = 1024, 1024
if req.size and "x" in req.size.lower():
width, height = (int(v) for v in req.size.lower().split("x"))
data = []
for _ in range(max(1, req.n)):
# Turbo is the 8-step distilled checkpoint: CFG is distilled in, so
# guidance_scale MUST be 0.0 (Krea's recommended Turbo setting). Leaving
# it unset inherits the pipeline default of 4.5 — meant for Krea-2-Raw —
# which over-cooks Turbo output. For Krea-2-Raw use ~52 steps + guidance ~3.5.
image = pipe(
req.prompt,
num_inference_steps=8,
guidance_scale=0.0,
height=height,
width=width,
).images[0]
buf = io.BytesIO()
image.save(buf, format="PNG")
data.append({"b64_json": base64.b64encode(buf.getvalue()).decode()})
return {"created": 0, "data": data}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment