Skip to content

Instantly share code, notes, and snippets.

View yourbuddyconner's full-sized avatar

Conner Swann yourbuddyconner

View GitHub Profile
@yourbuddyconner
yourbuddyconner / 2026-04-09-retry-after-headers-design.md
Last active April 9, 2026 21:02
Retry-After Headers for Transient Errors — Design Spec

Retry-After Headers for Transient Errors

Problem Statement

Today, when requests fail due to transient conditions (downstream timeouts, rate-limit blocks), callers have no machine-readable signal for how long to wait before retrying. Messages like "Resource exhausted; please wait a minute and try again" are human-readable but not actionable by SDKs or CI harnesses. This forces clients to use fixed sleep times or blind retry loops.

Note: the codebase already has error handling patterns that preserve meaningful messages for user-facing error codes (via IsUserError()). Sanitization to opaque "internal server error (UUID)" only applies to internal/non-user errors. Whether error messages are consistently useful across all code paths is a separate tech debt question.

The standard HTTP Retry-After header tells callers "this failed, but try again after N seconds." That's the missing piece.

@yourbuddyconner
yourbuddyconner / 2026-04-07-github-app-manifest-flow-design.md
Created April 7, 2026 20:00
GitHub App Manifest Installation Flow — Design Spec

GitHub App Manifest Installation Flow

Date: 2026-04-07 Status: Draft Scope: Admin GitHub App setup via manifest flow, post-setup management UI, single-installation model

Problem

The current GitHub App setup requires an admin to manually create a GitHub App on github.com, copy the App ID and PEM private key, paste them into the Valet settings form, and click "Verify" to discover installations. This is high-friction and error-prone — especially for read-only repo access, which should be a two-click operation.

Proposal: Retry-After headers for transient errors + e2e flakiness SLO

Context

Investigation into e2e flakiness on main (original writeup) found that a significant portion of test failures are caused by transient errors (services not ready, RPC providers briefly unreachable, etc.) that get sanitized into opaque "internal server error (UUID)" responses. Callers — both e2e tests and production users — can't distinguish transient from permanent failures, so they can't make informed retry decisions.

After discussion with @zane, @Bijan, @Mohammad, @Mohammed, and @omkar, we aligned on two proposals:

  1. A mechanism for services to signal "this is transient, retry" without exposing internal error details
  2. An SLO framework for e2e test reliability that automatically detects flaky tests and routes them to the right team

Error sanitization causes opaque 500s, masking transient failures in e2e tests

The issue

Our error sanitization framework (pkg/errors) replaces internal error messages with "internal server error (UUID)" before they reach callers. This is correct for production security, but it has a side effect: when a transient failure occurs (RPC provider briefly unreachable, service not yet warmed up, etc.), the caller gets the same opaque response as a genuine internal bug. Tests — and users — can't distinguish between the two, and can't make informed retry decisions.

The problem is amplified in the broadcasting path, where there are two independent error classification layers that both need to agree for the real error message to reach the caller:

  1. The RPC client maps EVM errors to gRPC codes (ToGrpcErrorCode)
  2. The broadcaster independently checks if the error is user-attributable ([`isUserBro
@yourbuddyconner
yourbuddyconner / llama-cpp-serve.yaml
Created June 19, 2024 19:06
Skypilot Llamacpp Skypilot Config
# service.yaml
service:
readiness_probe: /v1/models
replicas: 1
# Fields below describe each replica.
resources:
ports: 8000
cpus: 4+
accelerators: {A100:1}
// Define a function to handle the document and return its type
function discoverDocumentType(document) {
// Code to discover the type of document
return documentType;
}
// Define a function to retrieve pre-built question examples from a database
function getQuestionExamples(documentType) {
if (documentType === "legal contract") {
return [
@yourbuddyconner
yourbuddyconner / bootstrap_balances.json
Created February 22, 2021 20:24
Rosetta conf file for mina-rosetta
[]
import docker
import os
import subprocess
import click
import glob
import json
import random
import re
import sys
from pathlib import Path
#!/usr/bin/env python3
# script to find common best-tip prefix over a list of nodes using GraphQL query
import os
import sys
import json
import click
import subprocess
import requests