End-to-end testing of the Helion problem set using kernelbot API + popcorn-cli + GitHub Actions (Nebius B200 runners).
- PostgreSQL running locally (
pg_isreadyshould return "accepting connections") popcorn-cliinstalled (which popcorn-cli)ghCLI authenticated withworkflowscope (gh auth status)reference-kernelsrepo cloned at~/Dev/reference-kernelskernelbotrepo cloned at~/Dev/kernelbot
cd ~/Dev/kernelbot
# Create database (skip if it already exists)
createdb kernelbot
# Run migrations
uv run yoyo apply \
--database "postgresql://$(whoami)@localhost:5432/kernelbot" \
src/migrations/ --batch
# Create a test user
psql "postgresql://$(whoami)@localhost:5432/kernelbot" -c \
"INSERT INTO leaderboard.user_info (id, user_name, cli_id, cli_valid)
VALUES ('999999', 'testuser', 'test-cli-id-123', true)
ON CONFLICT (id) DO UPDATE SET cli_id = 'test-cli-id-123', cli_valid = true;"Create ~/Dev/kernelbot/.env:
# GitHub (get token from: gh auth token)
GITHUB_TOKEN=<your-github-token>
GITHUB_REPO=gpu-mode/kernelbot
# Local PostgreSQL
DATABASE_URL=postgresql://$(whoami)@localhost:5432/kernelbot
DISABLE_SSL=true
# Admin token for local testing
ADMIN_TOKEN=79aa14edc807892439a3fafcae9a262f
# Not needed for API-only mode
DISCORD_TOKEN=placeholder_discord_tokenYour GitHub token needs the workflow scope to dispatch GitHub Actions. You can get it from:
gh auth tokencd ~/Dev/kernelbot/src/kernelbot
uv run python main.py --api-only
# Server runs on http://localhost:8000curl -X POST "http://localhost:8000/admin/update-problems" \
-H "Authorization: Bearer 79aa14edc807892439a3fafcae9a262f" \
-H "Content-Type: application/json" \
-d '{"problem_set": "helion", "force": true}'Verify they loaded:
curl -s http://localhost:8000/leaderboards | python3 -c "
import json, sys
for lb in json.load(sys.stdin):
if lb['name'] in ['fp8_quant','causal_conv1d','gated_deltanet_chunk_fwd_h','gated_deltanet_chunk_fwd_o','gated_deltanet_recompute_w_u']:
print(f\"{lb['name']}: {lb['gpu_types']}\")
"# Backup your real config
cp ~/.popcorn.yaml ~/.popcorn.yaml.bak
# Use test CLI ID
echo "cli_id: test-cli-id-123" > ~/.popcorn.yamlSet the API URL:
export POPCORN_API_URL=http://127.0.0.1:8000Submit each problem (test mode):
PROBLEMS_DIR=~/Dev/reference-kernels/problems/helion
# fp8_quant
cd $PROBLEMS_DIR/fp8_quant_py
popcorn-cli submit submission.py --gpu B200_Nebius --leaderboard fp8_quant --mode test --no-tui
# causal_conv1d
cd $PROBLEMS_DIR/causal_conv1d_py
popcorn-cli submit submission.py --gpu B200_Nebius --leaderboard causal_conv1d --mode test --no-tui
# gated_deltanet_chunk_fwd_h
cd $PROBLEMS_DIR/gated_deltanet_chunk_fwd_h_py
popcorn-cli submit submission.py --gpu B200_Nebius --leaderboard gated_deltanet_chunk_fwd_h --mode test --no-tui
# gated_deltanet_chunk_fwd_o
cd $PROBLEMS_DIR/gated_deltanet_chunk_fwd_o_py
popcorn-cli submit submission.py --gpu B200_Nebius --leaderboard gated_deltanet_chunk_fwd_o --mode test --no-tui
# gated_deltanet_recompute_w_u
cd $PROBLEMS_DIR/gated_deltanet_recompute_w_u_py
popcorn-cli submit submission.py --gpu B200_Nebius --leaderboard gated_deltanet_recompute_w_u --mode test --no-tuiOr submit all in parallel (background):
for prob in fp8_quant_py causal_conv1d_py gated_deltanet_chunk_fwd_h_py gated_deltanet_chunk_fwd_o_py gated_deltanet_recompute_w_u_py; do
lb=$(echo $prob | sed 's/_py$//')
(cd $PROBLEMS_DIR/$prob && popcorn-cli submit submission.py --gpu B200_Nebius --leaderboard $lb --mode test --no-tui) &
done
waitpsql "postgresql://$(whoami)@localhost:5432/kernelbot" -c "
SELECT l.name, r.mode, r.passed, r.runner, r.start_time
FROM leaderboard.runs r
JOIN leaderboard.submission s ON s.id = r.submission_id
JOIN leaderboard.leaderboard l ON l.id = s.leaderboard_id
WHERE l.name IN ('fp8_quant','causal_conv1d','gated_deltanet_chunk_fwd_h','gated_deltanet_chunk_fwd_o','gated_deltanet_recompute_w_u')
ORDER BY r.start_time DESC;
"cp ~/.popcorn.yaml.bak ~/.popcorn.yaml
rm ~/.popcorn.yaml.bakReplace --mode test with:
--mode benchmark— runs benchmarks (no leaderboard score)--mode leaderboard— full scored run (correctness + benchmarks, geom mean score)--mode profile— torch profiler output
- Job stays queued: Nebius B200 runners may be offline or busy. Check with:
gh run view <RUN_ID> --repo gpu-mode/kernelbot --json status
- "too many values to unpack": Task definition in DB is stale. Re-sync with force:
curl -X POST "http://localhost:8000/admin/update-problems" \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"problem_set": "helion", "force": true}'
- 401 from popcorn-cli: CLI ID not in DB. Re-run the test user INSERT from step 1.
- Connection refused: Make sure API server is running and use
127.0.0.1notlocalhost(IPv6 issues). Alsounset HTTP_PROXY HTTPS_PROXY.
gated_deltanet_chunk_fwd_hsubmission fails with "too many values to unpack (expected 4)". The submission.py unpacks 4 values but the runner receives a different input tuple. This appears to be a mismatch between the task stored in the DB and the current reference code. Force re-syncing problems may fix this.