Skip to content

Instantly share code, notes, and snippets.

@sdboyer
Created February 10, 2026 14:38
Show Gist options
  • Select an option

  • Save sdboyer/f8dd3b526317b61128015638f5a25639 to your computer and use it in GitHub Desktop.

Select an option

Save sdboyer/f8dd3b526317b61128015638f5a25639 to your computer and use it in GitHub Desktop.
Test Selection Strategy for Configuration-Sensitive API Surface Tests

Test Selection Strategy for Configuration-Sensitive Tests

The Problem

The API surface test suite contains tests that exercise behaviors which may or may not be available on a given instance, depending on how that instance is configured. For example:

  • Model tests require models to be deployed
  • Inference tests require both chat and completion models, plus working inference infrastructure
  • Remote session tests require remote_sandbox execution environments and Temporal
  • Sandbox tests require at least one execution environment of any type

When running tests against an arbitrary target instance, we must decide which tests to run. Running tests for capabilities the instance lacks produces false negatives (failures); failing to run tests for capabilities the instance has produces false positives (gaps in coverage).

The Constraint

The test runner (whether go test, a CI workflow, or a human) does not inherently know what the target instance is capable of. This information must come from somewhere.

Two Approaches

Option A: Capability Self-Discovery

Tests probe the target instance at runtime to discover what's available, then skip themselves if prerequisites aren't met.

func TestAgentRemoteSessions(t *testing.T, api Client) {
    execEnv, ok := FindExecutionEnvironmentByType(t, api, "remote_sandbox")
    if !ok {
        t.Skip("no remote_sandbox execution environment available")
    }
    // ... test proceeds
}

How it works:

  • Each test checks its own prerequisites via API calls
  • Tests self-skip when prerequisites aren't met
  • The workflow runs all tests; the test suite self-adapts

Advantages:

  • Simple to implement
  • No external coordination required
  • Tests are self-documenting about their requirements

Disadvantages:

  • Silent coverage gaps: If the instance is misconfigured (or has a bug in its capability reporting), tests skip silently. We may believe we have coverage when we don't.
  • False confidence: A green test run might mean "all tests passed" or "half the tests skipped and we didn't notice."
  • SUT determines its own test coverage: The System Under Test controls which tests run against it. This inverts the normal testing relationship.

Option B: Configuration-Driven Test Selection

An external system with knowledge of the instance's intended configuration determines which tests should run. Tests that should run but can't (because prerequisites are missing) fail explicitly.

func TestAgentRemoteSessions(t *testing.T, api Client) {
    execEnv := RequireRemoteSandboxExecutionEnvironment(t, api)
    // If this fails, the test fails - not skips
    // ... test proceeds
}

The test selection happens before test execution, based on configuration knowledge:

Instance Config ──► Test Selector ──► "Run these tests" ──► Test Runner
                                                                │
                                                                ▼
                                              Tests fail if prerequisites missing

How it works:

  • A system with knowledge of instance configuration determines which tests apply
  • Selected tests are expected to pass; missing prerequisites are failures, not skips
  • Unselected tests are explicitly excluded with documented rationale

Advantages:

  • Explicit coverage tracking: We know exactly which tests should run against which configurations
  • Misconfiguration detection: If a test is selected but prerequisites are missing, we find out immediately
  • Configuration is the source of truth: Not the instance's self-reporting

Disadvantages:

  • Requires maintaining a mapping between configuration and test applicability
  • Requires infrastructure to convey configuration knowledge to the test selector

Why We're Choosing Option B

The fundamental issue with Option A is that the SUT's self-reporting determines test coverage. This creates a dangerous feedback loop:

  1. Instance is misconfigured (e.g., remote sandbox not enabled)
  2. Tests probe instance, find no remote sandbox capability
  3. Tests skip themselves
  4. CI reports green
  5. We have no idea that remote session functionality is untested

This is particularly insidious because:

  • It fails silently
  • It requires active vigilance to notice (checking skip counts, auditing test output)
  • The failure mode is "we think we're testing something we're not"

With Option B, the failure mode is reversed:

  • If configuration says "remote sandbox is enabled" but it isn't actually working, the test fails
  • Failures are loud; we investigate
  • We maintain a clear contract: "this configuration implies these tests must pass"

A foundational premise of poolio design is to have precisely this kind of insight into instance configuration by controlling it centrally, via both component.cue definitions and drawing it all together in the bundle spec. Once that's in place, we can relieve the operator of the need to pick the right set of tests to run.


Current State

Until the poolio-driven test selection is implemented, we use Option A (self-discovery with skip):

  1. Prerequisite helpers use t.Skip: SkipWithoutModel, SkipWithoutRemoteSandbox, etc. skip the test if prerequisites are missing.

  2. All tests run by default: The surface test runner calls all test modules; tests self-skip based on what the instance provides.

  3. The operator sees skip counts: Test output shows which tests were skipped and why, making coverage gaps visible (though not as loud as failures).

This trades off silent coverage gaps for the ability to run the test suite against any instance without pre-configuration. Once poolio integration is complete, we'll switch to Option B where configuration determines test selection and missing prerequisites become failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment