Brev CLI: GPU Provisioning — Developer Experience Report

Author: Field notes from a GPU instance provisioning session Date: February 2026 CLI Version: v0.6.316 Context: Provisioning a single H100 PCIe instance to serve Qwen3-Coder-Next-FP8 via vLLM, then connecting Claude Code to it

The User's Starting Point

When someone sits down to provision a GPU, here is what they know:

They have a workload — a model name, a training job, an inference server
They have a budget — a rough sense of what they're willing to spend per session or per day
They have a timeline — they need it now, or in a few hours, or by tomorrow

Here is what they do not know:

Instance type strings (hyperstack_H100, gpu-h100-sxm.1gpu-16vcpu-200gb, a3-megagpu-8g:nvidia-h100-mega-80gb:8)
Which cloud providers Brev aggregates and what their naming conventions are
How much VRAM a given model requires in a given precision
That the --gpu flag routes through deprecated code and fails for half of all GPU types

The CLI's job is to bridge this gap — to take what the user knows and produce a running GPU instance. At the moment, it instead requires the user to already have knowledge that the CLI itself should be supplying. This report documents where that gap causes failures, why each failure happens at the code level, and what the fixes look like.

What the Workflow Should Feel Like

Before cataloguing failures, it is worth establishing what good looks like. A user provisioning a GPU to run a specific model should be able to do this:

$ brev gpu find --model "Qwen/Qwen3-Coder-Next-FP8"

Qwen3-Coder-Next-FP8 needs ~75 GB VRAM (FP8 weights, single GPU)

SINGLE-GPU OPTIONS (80 GB — fits with headroom for KV cache)
  TYPE                      PRICE     BOOT    PROVIDER     AVAILABLE
  hyperstack_H100           $2.28/hr  ~3 min  Hyperstack   ✓ now
  latitude_H100             $2.39/hr  ~4 min  Latitude     ✓ now
  nebius.h100x1.sxm         $3.54/hr  ~5 min  Nebius       ✓ now
  scaleway_H100             $3.70/hr  ~5 min  Scaleway     ✓ now
  digitalocean_H100_sxm5    $4.01/hr  ~5 min  DigitalOcean ✓ now
  paperspace_H100           $7.19/hr  ~8 min  Paperspace   ✓ now

Cheapest option:  hyperstack_H100
To create:        brev create my-gpu --type hyperstack_H100

Or, if they just want it to work:

$ brev create my-gpu --model "Qwen/Qwen3-Coder-Next-FP8"
Selecting cheapest available GPU with ≥75 GB VRAM...
→ hyperstack_H100 ($2.28/hr, 80 GB VRAM, ~3 min boot)
Creating instance my-gpu...
✓ Ready in 4m12s

  GPU:       1× NVIDIA H100 PCIe (80 GB VRAM)
  RAM:       180 GB
  Root disk: 100 GB  ← not enough for model download
  Ephemeral: 750 GB at /ephemeral  ← use this for HuggingFace cache
  SSH:       ssh -F ~/.brev/ssh_config my-gpu-host
  Cost:      $2.28/hr

Or, if they have a budget constraint:

$ brev gpu find --vram 80 --budget 30 --hours 8

Budget: $30 over 8 hours = $3.75/hr ceiling

  TYPE               VRAM    PRICE     8-HR TOTAL
  hyperstack_H100    80 GB   $2.28/hr  $18.24
  latitude_H100      80 GB   $2.39/hr  $19.12
  nebius.h100x1.sxm  80 GB   $3.54/hr  $28.32

None of these require the user to know type strings. None of them silently fail. All of them show what was actually provisioned, not what was requested. That is the target. What follows is an account of how far the current experience falls short, and exactly what to change.

The Three Things That Matter Most to Users (In Order)

Understanding user priorities shapes which failures are most damaging:

1. "Will it fit?" VRAM vs. model size is binary. If the model doesn't fit in VRAM, nothing else matters. Users should never have to calculate this themselves. The tool should compute it from a model name or a parameter count.

2. "Can I get it right now?" Not "is_available: true in the catalog" — that's a stale signal. Users want to know: if I run create right now, will it succeed in under 10 minutes? Catalog availability and live provisioning success are different things, and the current tool conflates them.

3. "What will it cost me?" Not $2.28/hr. $18.24 for an 8-hour session. People budget in sessions, not in hours. The current output only shows hourly rates.

Everything else — provider name, instance type string, region, boot time — is secondary. The current CLI leads with instance type strings and leaves the user to figure out the rest.

Failure 1 — `brev create --gpu` Silently Fails for Half of All GPU Types

What the user experiences

The user reads the guide. The guide says:

brev create my-instance --gpu hyperstack_H100

They run it. After a 10-second spinner:

rpc error: code = NotFound desc = instance type hyperstack_H100 not found
exit status 1

They try variations. All fail:

brev create my-instance --gpu latitude_H100         # not found
brev create my-instance --gpu scaleway_H100         # not found
brev create my-instance --gpu paperspace_H100       # not found
brev create my-instance --gpu nebius.h100x1.sxm     # not found
brev create my-instance --gpu gpu-h100-sxm.1gpu-16vcpu-200gb  # not found

The only types that succeed are GCP-native format:

brev create my-instance --gpu a2-highgpu-1g:nvidia-tesla-a100:1  # works — A100, not H100

The user has no way to know why some types work and others don't. There is no indication that the --gpu flag is deprecated or routes through code that is known to be broken.

Root cause

The installed binary (v0.6.316) wires brev create to the old pkg/cmd/create/create.go code path. That code takes whatever string is passed to --gpu and sends it directly as InstanceType to POST /api/organizations/{orgID}/workspaces — with no WorkspaceGroupID field.

The backend can resolve GCP-native instance types because they map implicitly to the GCP workspace group. It cannot resolve Shadeform, Nebius, Launchpad, or Lambda Labs types without an explicit WorkspaceGroupID, because those require routing through a different compute backend.

The fix already exists in the codebase. pkg/cmd/gpucreate/gpucreate.go — explicitly marked as the successor — first calls GetAllInstanceTypesWithWorkspaceGroups(orgID) to resolve the mapping, then passes WorkspaceGroupID alongside InstanceType:

// From gpucreate.go — the correct code path
if wgID := c.allInstanceTypes.GetWorkspaceGroupID(spec.Type); wgID != "" {
    cwOptions.WorkspaceGroupID = wgID
}

The old code path has this comment at the top of the file:

// Deprecated: This package is superseded by pkg/cmd/gpucreate which is the
// registered "brev create" command. This code is not wired into cmd.go and
// should not be modified. Use gpucreate for all new work.

Despite the comment claiming it is "not wired into cmd.go", the installed binary uses the old --gpu behavior. The new code uses --type. The flag rename is the breakage point: the installed binary exposes --gpu, which routes to old code. The new binary would expose --type (or both, with --gpu as a deprecated alias). The release has not shipped.

The catalog API confirms the type exists

To make this more confusing: the authenticated catalog endpoint GET /api/instances/alltypesavailable/{orgID} returns hyperstack_H100 with "is_available": true. The user can verify this independently:

TOKEN=$(python3 -c "import json; d=json.load(open('~/.brev/credentials.json'
    .replace('~', __import__('os').path.expanduser('~')))); print(d['access_token'])")
ORG_ID="org-XXXX..."
curl -sf -H "Authorization: Bearer $TOKEN" \
  "https://brevapi.us-west-2-prod.control-plane.brev.dev/api/instances/alltypesavailable/$ORG_ID" \
  | python3 -c "
import sys, json
d = json.load(sys.stdin)
h100 = [i for i in d['allInstanceTypes'] if 'H100' in i.get('type','')]
for item in h100[:4]:
    print(item['type'], item['is_available'])
"
# hyperstack_H100 True
# scaleway_H100   True
# latitude_H100   True
# ...

The catalog says the type exists and is available. The CLI says it is not found. The contradiction is irresolvable from user-space without reading server-side Go code.

Workaround discovered (requires source code archaeology)

After two hours of debugging, the working solution was to bypass the CLI entirely:

curl -sf -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  "https://brevapi.us-west-2-prod.control-plane.brev.dev/api/organizations/$ORG_ID/workspaces" \
  -d '{
    "name": "qwen3-h100",
    "workspaceGroupId": "shadeform",
    "instanceType": "hyperstack_H100",
    "diskStorage": "120Gi",
    "workspaceVersion": "v1",
    "vmOnlyMode": true,
    "portMappings": {},
    "workspaceTemplateId": "4nbb4lg2s",
    "launchJupyterOnStart": false
  }'

This required:

Locating the bearer token in ~/.brev/credentials.json (undocumented)
Finding the API base URL from source (brevapi.us-west-2-prod.control-plane.brev.dev)
Reading pkg/store/instancetypes.go to learn that workspaceGroupId was the missing field
Reading pkg/store/workspace.go to find the field names for CreateWorkspacesOptions
Reading gpusearch_test.go to find that hyperstack_H100 maps to workspace group "shadeform"
Knowing that workspaceTemplateId: "4nbb4lg2s" is the standard user template

This is not a reasonable user expectation. It worked; but the workaround requiring internal source code inspection is a sign the public interface has diverged from the implementation.

Fixes

Immediate: Cut a release from main. The gpucreate code path is correct and already merged.

Defensive (if old path must stay): Pre-flight the instance type through the catalog API before creating, then populate WorkspaceGroupID:

allTypes, err := createStore.GetAllInstanceTypesWithWorkspaceGroups(orgID)
if err == nil && allTypes != nil {
    if wgID := allTypes.GetWorkspaceGroupID(options.InstanceType); wgID != "" {
        cwOptions.WorkspaceGroupID = wgID
    }
}

User-facing: Surface the deprecated --gpu flag with a warning:

Warning: --gpu routes through a deprecated code path that does not support
Shadeform, Nebius, or Launchpad instance types. Use --type instead:

  brev create my-gpu --type hyperstack_H100

Server-side: Return InvalidArgument (not NotFound) when WorkspaceGroupID is absent but required for the given instance type, with an actionable message:

instance type 'hyperstack_H100' requires workspaceGroupId 'shadeform'.
Your CLI may be outdated. Run: brew upgrade brev

Failure 2 — There Is No Way to Discover Available GPU Types

What the user experiences

After hyperstack_H100 fails, the user's next move is to find out what types are valid. There is nowhere to look:

$ brev --help
Instance Commands:
  create   Create a new instance
  delete   Delete an instance
  ls       List instances within active org

No brev gpus. No brev search. No brev list-types. The documentation link in the --gpu flag help text — See https://brev.dev/docs/reference/gpu — redirects to https://docs.nvidia.com/brev/ and returns a 404. The user's only option is to guess type strings or read GitHub source code.

To actually find valid type strings, this session required:

Reading pkg/cmd/gpusearch/gpusearch_test.go — example types appear as test cases
Calling the public API manually: GET https://api.brev.dev/v1/instance/types
Calling the authenticated org API: GET /api/instances/alltypesavailable/{orgID}

Step 2 returns 529 items in a single payload with no filtering or pagination. Step 3 returns 627 items per org. Neither is a reasonable user interface.

What already exists in the codebase

pkg/cmd/gpusearch/gpusearch.go is a fully implemented, tested GPU search command with filtering by GPU name, VRAM, compute capability, price, provider, and boot time. It processes the catalog API response, filters it, and sorts results. It is not registered as a CLI subcommand in v0.6.316.

Fixes

Ship brev search by wiring gpusearch into cmd.go — likely a one-line change.

Add brev gpu find as an alias focused on the user's mental model — filters by VRAM need, shows session cost, shows live availability, and emits the exact brev create command to run:

$ brev gpu find --vram 80
$ brev gpu find --model "meta-llama/Llama-3-70B"   # derives VRAM from HF model card
$ brev gpu find --budget 30 --hours 8

Fix or remove the dead docs URL. A 404 in the help text of the primary user-facing flag is a worse experience than no URL at all. Either maintain a live page or omit the reference.

Add pagination and filtering to the public API:

GET https://api.brev.dev/v1/instance/types?gpu=H100&available=true&per_page=20

Currently the endpoint returns all 529 instances in one response with no filtering, making it unusable as a browsing interface.

Failure 3 — Disk Size Is Silently Ignored

What the user experiences

The first qwen3-fresh instance was created via the REST API with "diskStorage": "500Gi". The API returned HTTP 200. The instance spun up. Everything appeared normal. Then vLLM started downloading the model:

[WARNING] Not enough free disk space to download the file.
The expected file size is: 2001.74 MB.
The target location only has 1953.52 MB free disk space.
...
The target location only has 0.00 MB free disk space.
[ERROR] EngineCore failed to start.

$ df -h /
Filesystem  Size  Used  Avail  Use%
/dev/vda1    97G   97G     0   100%

The requested 500 Gi was silently overridden by the Hyperstack provider's fixed root disk of ~100 GB. The disk filled up 62 GB into the 80 GB model download. The vLLM container crashed. The error message pointed to a vLLM engine failure, not a disk problem — the actual disk warnings were buried 200 lines earlier in the container logs.

The instance had to be deleted and re-created. An additional hour was spent because the delete was not immediate: the API returned "duplicate workspace with name qwen3-fresh" when creating a replacement while the first instance was still in the process of being torn down.

Why this is dangerous

The user explicitly requested 500 Gi. The API accepted the request (200 OK) and provisioned the instance. There was no error, no warning, no indication that the disk size request had been ignored. The downstream failure — model download running out of space — looked like a vLLM bug, not a provisioning problem.

The Hyperstack provider also has a 750 GB ephemeral disk at /ephemeral. This is the correct place to store the HuggingFace cache. It is completely undocumented. The user cannot discover it without inspecting df -h output on the instance.

Root cause

The Brev backend forwards diskStorage to the underlying provider. Hyperstack has a fixed root disk (~100 GB) and does not resize it. The backend does not validate the request against provider limits before creating the instance. The response body echoes back the requested disk size rather than the provisioned disk size, making it impossible to detect the mismatch from the create response alone.

Fixes

API: Return actual provisioned disk in the create response, not the requested value:

{
  "workspace": {
    "id": "...",
    "diskStorage": {
      "requested": "500Gi",
      "root": "100Gi",
      "warning": "hyperstack_H100 has a fixed root disk of 100 GiB. The requested 500 GiB was not applied.",
      "ephemeral": [
        { "path": "/ephemeral", "size": "750Gi", "persistent": "reboot-safe, not stop/start-safe" }
      ]
    }
  }
}

API: Validate disk size against provider limits before creating and return a 400 with an actionable message if the request exceeds what the provider can honor.

CLI: Print actual provisioned configuration after create, including disk layout and mount points. The ephemeral disk at /ephemeral is not discoverable any other way.

Catalog API: Include disk constraints per instance type:

{
  "type": "hyperstack_H100",
  "diskStorage": {
    "root": { "min": "50Gi", "max": "100Gi", "default": "100Gi", "resizable": false },
    "ephemeral": [
      { "path": "/ephemeral", "size": "750Gi" }
    ]
  }
}

Documentation: The Hyperstack instance type page must document:

Root disk is fixed at ~100 GB and cannot be increased
/ephemeral (750 GB) exists and survives reboots but may be wiped on re-provision
For any workload with files >50 GB, always mount caches to /ephemeral

For a model-serving use case, the correct vLLM command is:

mkdir -p /ephemeral/huggingface
docker run -v /ephemeral/huggingface:/root/.cache/huggingface ...

This is not in any current Brev documentation.

Failure 4 — SSH Host Aliases Are Inconsistent and Undocumented

What the user experiences

After brev refresh, the SSH config contains two entries per instance:

Host qwen3-fresh
  Hostname 69.19.137.94
  User ubuntu
  IdentityFile "/Users/dsrinivas/.brev/brev.pem"
  ...

Host qwen3-fresh-host
  Hostname 69.19.137.94
  User shadeform
  ...

The standard Brev documentation says: ssh -F ~/.brev/ssh_config <instance-name>. Following that instruction:

$ ssh -F ~/.brev/ssh_config qwen3-fresh "nvidia-smi"
ubuntu@69.19.137.94: Permission denied (publickey).

The correct alias is qwen3-fresh-host with user shadeform. The ubuntu user either does not exist on Shadeform instances or does not have the brev key authorized. The qwen3-fresh alias is a broken entry that the SSH config includes by default.

This inconsistency is invisible: GCP instances work with the plain <name> alias (user ubuntu). Shadeform instances require the <name>-host alias (user shadeform). Tab completion shows both options; the user may try either. Nothing in the CLI output, the SSH config, or the documentation indicates which alias to use for a given provider.

Fix

Option A — Eliminate the broken entry: If ubuntu is not valid on Shadeform instances, do not write that entry to the SSH config. Only write the working alias.

Option B — Make the primary alias authoritative: Always write the correct username to the Host <name> entry. Use the provider type to select it:

func SSHUsername(providerType string) string {
    switch providerType {
    case "shadeform":  return "shadeform"
    case "nebius":     return "user"
    default:           return "ubuntu"
    }
}

Option C — Add a comment to the SSH config:

# For Shadeform instances, use: ssh -F ~/.brev/ssh_config qwen3-fresh-host
# For GCP instances, use: ssh -F ~/.brev/ssh_config qwen3-fresh

CLI — Print the SSH command at create time:

✓ qwen3-fresh is ready
  SSH: ssh -F ~/.brev/ssh_config qwen3-fresh-host   ← the correct alias

The user should never have to discover this by trial and error.

Failure 5 — Error Messages Do Not Distinguish Between Different Failure Modes

What the user experiences

Every failed brev create attempt produces the same message:

rpc error: code = NotFound desc = instance type hyperstack_H100 not found

This message is identical across four completely different situations:

Actual cause	What the error says
Type string is a typo / genuinely doesn't exist	"not found"
Type exists but CLI isn't passing WorkspaceGroupID	"not found"
Type exists but org has no live capacity right now	"not found"
Type exists but org plan doesn't include it	"not found"

The user cannot determine which situation applies without server-side access. In this session, the actual cause was the second one — the CLI bug — but every troubleshooting path led to the same dead end because the error gave no direction.

A better error for the actual failure would be:

brev create: instance type 'hyperstack_H100' could not be provisioned.

The instance type exists in the catalog but your CLI (v0.6.316) does not support
this provider type. Please update your brev CLI:

  brew upgrade brev      # macOS
  brev update            # (if available)

If the issue persists after upgrading, run 'brev search --gpu-name H100' to
see available alternatives.

Fixes

Server-side: Return different gRPC status codes for different failures, and include an actionable details field:

NotFound       → type string doesn't exist in the catalog at all
InvalidArgument → type exists but create payload is missing required fields (WorkspaceGroupID)
ResourceExhausted → type exists, payload is valid, but no capacity right now
PermissionDenied → type exists but org plan doesn't include it

Client-side: The CLI should pre-validate against the catalog before sending the create request, so most failures are caught locally with a useful message before ever reaching the backend:

allTypes, _ := store.GetAllInstanceTypesWithWorkspaceGroups(orgID)
spec := allTypes.Find(instanceType)
if spec == nil {
    return fmt.Errorf("unknown instance type %q\nRun 'brev search' to see valid options", instanceType)
}
if !spec.IsAvailable {
    alts := allTypes.FindAlternatives(spec.GPU, 3)
    return fmt.Errorf("%q has no available capacity right now\nAlternatives: %v", instanceType, alts)
}

Error quality is a force multiplier. The cheapest improvement in this entire report is writing better error strings. No architecture changes, no new features, no release infrastructure. The user who sees "your CLI is outdated, run brew upgrade brev" spends 30 seconds. The user who sees "not found" spends two hours.

Failure 6 — `pip install litellm` Fails at Runtime

What the user experiences

The guide says pip install litellm. The install succeeds. Running the proxy server fails:

$ litellm --model openai/qwen3-coder-next --api_base http://localhost:8000/v1 --port 4000
Traceback (most recent call last):
  ...
ImportError: Missing dependency No module named 'backoff'.
Run `pip install 'litellm[proxy]'`

The error message is actually useful — it tells the user exactly what to run. But it surfaces after 45 minutes of waiting for an 80 GB model to download, at the point when the user is closest to being done. The fix is trivial (pip install 'litellm[proxy]'), but the timing makes it feel like a fresh obstacle.

Additionally, on Ubuntu (the default Brev instance OS), pip install places executables in ~/.local/bin, which is not in the default PATH. Running litellm immediately after install produces command not found unless the user knows to add ~/.local/bin to their PATH.

What the correct install and start sequence looks like

pip install 'litellm[proxy]'          # NOT pip install litellm
export PATH="$HOME/.local/bin:$PATH"  # ~/.local/bin not in PATH by default on Ubuntu
export OPENAI_API_KEY=dummy           # litellm requires this even for local endpoints
nohup litellm \
  --model openai/qwen3-coder-next \
  --api_base http://localhost:8000/v1 \
  --drop_params \
  --port 4000 > /tmp/litellm.log 2>&1 &

Fixes

In all Brev model-serving guides: Replace pip install litellm with pip install 'litellm[proxy]'. A bare install is suitable only for using litellm as a library (SDK calls), not as a proxy server.

Upstream (litellm): Consider making proxy dependencies non-optional so pip install litellm produces a fully functional CLI. The current split ([proxy] extras) is reasonable for embedded SDK use but a friction point for server use, where the proxy is the primary product.

In guides: Add a verification step immediately after install:

litellm --version   # should print without ImportError

In guides: Explicitly note the PATH issue on Ubuntu:

export PATH="$HOME/.local/bin:$PATH"

The Gap Between Catalog and Create: A Trust Problem

Across all six failures, a common theme emerges: the catalog and the create endpoint speak different languages, and the user is caught in the middle.

When a user calls GET /api/instances/alltypesavailable/{orgID} and sees:

{ "type": "hyperstack_H100", "is_available": true, "base_price": { "amount": "2.280000" } }

...they reasonably conclude that brev create --gpu hyperstack_H100 will work. It doesn't.

When a user requests "diskStorage": "500Gi" and gets back HTTP 200, they reasonably conclude that 500 GiB was provisioned. It wasn't.

When is_available: true is in the catalog, the user reasonably expects the create call to succeed. It may not, because live capacity and catalog availability are different things, and the catalog field is a static snapshot.

Each of these mismatches is individually fixable. Together they produce a product that feels broken even though the underlying compute resources work correctly.

Suggested CLI Changes (Consolidated)

Priority 1 — Ship `main` as a release

The gpucreate package fixes Failure 1. It is merged, tested, and ready. Every day v0.6.316 remains current is a day where a large fraction of H100 provisioning attempts fail silently.

Priority 2 — Add `brev search` / `brev gpu find`

Wire pkg/cmd/gpusearch as a CLI subcommand. Minimum viable output:

$ brev search --gpu-name H100 --count 1
TYPE                      VRAM    PRICE     BOOT    PROVIDER    AVAILABLE
hyperstack_H100           80 GiB  $2.28/hr  ~3 min  Hyperstack  ✓
latitude_H100             80 GiB  $2.39/hr  ~4 min  Latitude    ✓
nebius.h100x1.sxm         80 GiB  $3.54/hr  ~5 min  Nebius      ✓
...
Run:  brev create <name> --type <TYPE>

Priority 3 — Pre-flight validation before create

Resolve WorkspaceGroupID from the catalog before calling CreateWorkspace. Fail fast with an actionable error if resolution fails.

Priority 4 — Show actual provisioned configuration

After a workspace becomes RUNNING, print what was actually provisioned — including actual disk sizes, ephemeral mounts, and the correct SSH alias.

Priority 5 — Fix SSH username selection per provider

Use provider type to select the correct SSH user when writing the SSH config. Remove or clearly annotate the alias that uses the wrong username.

Priority 6 — Deprecate `--gpu` with a visible warning

Add a deprecation notice to --gpu pointing users to --type (and to brev search to find valid types).

Suggested API Changes (Consolidated)

Change	Why
Return actual disk in create response (not requested)	Silent override is a trust violation
Validate disk size against provider limits before creating	Fail fast with a clear error
Include disk constraints in catalog response	Clients can pre-validate before creating
Return `InvalidArgument` (not `NotFound`) when `WorkspaceGroupID` is missing	Enables targeted client-side error messages
Return `ResourceExhausted` when live capacity is absent	Distinguishes capacity from config errors
Add filtering + pagination to public instance types API	529-item unfiltered response is unusable
Add `last_checked_at` or TTL to `is_available` field	Makes the staleness of availability explicit

Provider	SSH Username	SSH Config Alias
GCP	`ubuntu`	`<name>`
Shadeform / Hyperstack	`shadeform`	`<name>-host`
Nebius	`ubuntu`	`<name>`
Lambda Labs	`ubuntu`	`<name>`

Summary

#	Failure	User impact	Fix complexity
1	`brev create --gpu` fails for non-GCP types	Complete blocker for H100	Low — ship existing code
2	No GPU discovery command	Hours of type-string archaeology	Low — wire existing code
3	Disk size silently ignored	Instance crashes mid-download	Medium — API + provider docs
4	SSH alias inconsistency	`Permission denied` with no explanation	Low — username lookup fix
5	Error messages don't distinguish failure modes	Unresolvable debugging	Medium — server + client
6	`pip install litellm` missing proxy extras	Hits after 45 min of model download	Trivial — docs update

Estimated total engineering effort to close all six: 3–5 focused days. The highest-ROI items are Failures 1 and 2, both of which require shipping code that already exists in the repository rather than writing new code.

The underlying platform — the compute resources, the Shadeform aggregation, the API — is sound. The failures documented here are entirely in the interface layer: the CLI binary, the error messages, and the documentation. That makes them fixable without any infrastructure work.

Based on direct experience provisioning an H100 PCIe instance on Hyperstack via Brev in February 2026, with supplementary analysis of https://github.com/brevdev/brev-cli (main branch, circa Feb 2026). All code references are to that repository.

dims/brev-cli-dx-report.md

Brev CLI: GPU Provisioning — Developer Experience Report

The User's Starting Point

What the Workflow Should Feel Like

The Three Things That Matter Most to Users (In Order)

Failure 1 — brev create --gpu Silently Fails for Half of All GPU Types

What the user experiences

Root cause

The catalog API confirms the type exists

Workaround discovered (requires source code archaeology)

Fixes

Failure 2 — There Is No Way to Discover Available GPU Types

What the user experiences

What already exists in the codebase

Fixes

Failure 3 — Disk Size Is Silently Ignored

What the user experiences

Why this is dangerous

Root cause

Fixes

Failure 4 — SSH Host Aliases Are Inconsistent and Undocumented

What the user experiences

Fix

Failure 5 — Error Messages Do Not Distinguish Between Different Failure Modes

What the user experiences

Fixes

Failure 6 — pip install litellm Fails at Runtime

What the user experiences

What the correct install and start sequence looks like

Fixes

The Gap Between Catalog and Create: A Trust Problem

Suggested CLI Changes (Consolidated)

Priority 1 — Ship main as a release

Priority 2 — Add brev search / brev gpu find

Priority 3 — Pre-flight validation before create

Priority 4 — Show actual provisioned configuration

Priority 5 — Fix SSH username selection per provider

Priority 6 — Deprecate --gpu with a visible warning

Suggested API Changes (Consolidated)

Suggested Documentation Changes (Consolidated)

Summary

Failure 1 — `brev create --gpu` Silently Fails for Half of All GPU Types

Failure 6 — `pip install litellm` Fails at Runtime

Priority 1 — Ship `main` as a release

Priority 2 — Add `brev search` / `brev gpu find`

Priority 6 — Deprecate `--gpu` with a visible warning