Helion E2E Test Guide

End-to-end testing of the Helion problem set using kernelbot API + popcorn-cli + GitHub Actions (Nebius B200 runners).

Prerequisites

PostgreSQL running locally (pg_isready should return "accepting connections")
popcorn-cli installed (which popcorn-cli)
gh CLI authenticated with workflow scope (gh auth status)
reference-kernels repo cloned at ~/Dev/reference-kernels
kernelbot repo cloned at ~/Dev/kernelbot

1. Database Setup

cd ~/Dev/kernelbot

# Create database (skip if it already exists)
createdb kernelbot

# Run migrations
uv run yoyo apply \
  --database "postgresql://$(whoami)@localhost:5432/kernelbot" \
  src/migrations/ --batch

# Create a test user
psql "postgresql://$(whoami)@localhost:5432/kernelbot" -c \
  "INSERT INTO leaderboard.user_info (id, user_name, cli_id, cli_valid)
   VALUES ('999999', 'testuser', 'test-cli-id-123', true)
   ON CONFLICT (id) DO UPDATE SET cli_id = 'test-cli-id-123', cli_valid = true;"

2. Configure .env

Create ~/Dev/kernelbot/.env:

# GitHub (get token from: gh auth token)
GITHUB_TOKEN=<your-github-token>
GITHUB_REPO=gpu-mode/kernelbot

# Local PostgreSQL
DATABASE_URL=postgresql://$(whoami)@localhost:5432/kernelbot
DISABLE_SSL=true

# Admin token for local testing
ADMIN_TOKEN=79aa14edc807892439a3fafcae9a262f

# Not needed for API-only mode
DISCORD_TOKEN=placeholder_discord_token

Your GitHub token needs the workflow scope to dispatch GitHub Actions. You can get it from:

gh auth token

3. Start the API Server

cd ~/Dev/kernelbot/src/kernelbot
uv run python main.py --api-only
# Server runs on http://localhost:8000

4. Sync Helion Problems (if not already loaded)

curl -X POST "http://localhost:8000/admin/update-problems" \
  -H "Authorization: Bearer 79aa14edc807892439a3fafcae9a262f" \
  -H "Content-Type: application/json" \
  -d '{"problem_set": "helion", "force": true}'

Verify they loaded:

curl -s http://localhost:8000/leaderboards | python3 -c "
import json, sys
for lb in json.load(sys.stdin):
    if lb['name'] in ['fp8_quant','causal_conv1d','gated_deltanet_chunk_fwd_h','gated_deltanet_chunk_fwd_o','gated_deltanet_recompute_w_u']:
        print(f\"{lb['name']}: {lb['gpu_types']}\")
"

5. Configure popcorn-cli for Local Testing

# Backup your real config
cp ~/.popcorn.yaml ~/.popcorn.yaml.bak

# Use test CLI ID
echo "cli_id: test-cli-id-123" > ~/.popcorn.yaml

6. Submit All 5 Helion Problems

Set the API URL:

export POPCORN_API_URL=http://127.0.0.1:8000

Submit each problem (test mode):

PROBLEMS_DIR=~/Dev/reference-kernels/problems/helion

# fp8_quant
cd $PROBLEMS_DIR/fp8_quant_py
popcorn-cli submit submission.py --gpu B200_Nebius --leaderboard fp8_quant --mode test --no-tui

# causal_conv1d
cd $PROBLEMS_DIR/causal_conv1d_py
popcorn-cli submit submission.py --gpu B200_Nebius --leaderboard causal_conv1d --mode test --no-tui

# gated_deltanet_chunk_fwd_h
cd $PROBLEMS_DIR/gated_deltanet_chunk_fwd_h_py
popcorn-cli submit submission.py --gpu B200_Nebius --leaderboard gated_deltanet_chunk_fwd_h --mode test --no-tui

# gated_deltanet_chunk_fwd_o
cd $PROBLEMS_DIR/gated_deltanet_chunk_fwd_o_py
popcorn-cli submit submission.py --gpu B200_Nebius --leaderboard gated_deltanet_chunk_fwd_o --mode test --no-tui

# gated_deltanet_recompute_w_u
cd $PROBLEMS_DIR/gated_deltanet_recompute_w_u_py
popcorn-cli submit submission.py --gpu B200_Nebius --leaderboard gated_deltanet_recompute_w_u --mode test --no-tui

Or submit all in parallel (background):

for prob in fp8_quant_py causal_conv1d_py gated_deltanet_chunk_fwd_h_py gated_deltanet_chunk_fwd_o_py gated_deltanet_recompute_w_u_py; do
  lb=$(echo $prob | sed 's/_py$//')
  (cd $PROBLEMS_DIR/$prob && popcorn-cli submit submission.py --gpu B200_Nebius --leaderboard $lb --mode test --no-tui) &
done
wait

7. Verify Results in DB

psql "postgresql://$(whoami)@localhost:5432/kernelbot" -c "
SELECT l.name, r.mode, r.passed, r.runner, r.start_time
FROM leaderboard.runs r
JOIN leaderboard.submission s ON s.id = r.submission_id
JOIN leaderboard.leaderboard l ON l.id = s.leaderboard_id
WHERE l.name IN ('fp8_quant','causal_conv1d','gated_deltanet_chunk_fwd_h','gated_deltanet_chunk_fwd_o','gated_deltanet_recompute_w_u')
ORDER BY r.start_time DESC;
"

8. Restore popcorn-cli Config

cp ~/.popcorn.yaml.bak ~/.popcorn.yaml
rm ~/.popcorn.yaml.bak

Other Submission Modes

Replace --mode test with:

--mode benchmark — runs benchmarks (no leaderboard score)
--mode leaderboard — full scored run (correctness + benchmarks, geom mean score)
--mode profile — torch profiler output

Troubleshooting

Job stays queued: Nebius B200 runners may be offline or busy. Check with:
```
gh run view <RUN_ID> --repo gpu-mode/kernelbot --json status
```

"too many values to unpack": Task definition in DB is stale. Re-sync with force:

curl -X POST "http://localhost:8000/admin/update-problems" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"problem_set": "helion", "force": true}'

401 from popcorn-cli: CLI ID not in DB. Re-run the test user INSERT from step 1.
Connection refused: Make sure API server is running and use 127.0.0.1 not localhost (IPv6 issues). Also unset HTTP_PROXY HTTPS_PROXY.

Known Issues (2026-03-11)

gated_deltanet_chunk_fwd_h submission fails with "too many values to unpack (expected 4)". The submission.py unpacks 4 values but the runner receives a different input tuple. This appears to be a mismatch between the task stored in the DB and the current reference code. Force re-syncing problems may fix this.

msaroufim/helion-e2e-test.md

Select an option

No results found