Skip to content

Instantly share code, notes, and snippets.

@arubis
arubis / jaeger-index-mapping-review.md
Created April 3, 2026 20:39
Task Review: Jaeger Query Service Index Mapping Conflict (378d049f) — APPROVE

Task Review: Jaeger Query Service Index Mapping Conflict

Task ID: jaeger-query-service-indexmapping-conflict UUID: 378d049f-e957-4c78-9fa8-d1b84684d37e Version: 12 Author: chrorrala (Christian Orrala) Category: Platform Engineering Reviewer: Dylan Fitzgerald Date: 2026-04-03

@arubis
arubis / jaeger-task-difficulty-report.md
Last active April 3, 2026 00:13
Jaeger Query Service Index Mapping Conflict — Difficulty Recommendations for chrorrala

Jaeger Query Service Index Mapping Conflict — Difficulty Recommendations

Task 378d049f (v12) by Christian (chrorrala)
Backend Docker
Threshold mean < 0.50
Current mean 0.64 (7 biggie-nebula runs)
Verdict NEEDS_WORK
@arubis
arubis / jaeger-task-review.md
Last active April 2, 2026 23:51
Task review: jaeger-query-service-indexmapping-conflict (378d049f)

Task Review: Jaeger Query Service Index Mapping Conflict

Task UUID: 378d049f-e957-4c78-9fa8-d1b84684d37e Task ID: jaeger-query-service-indexmapping-conflict Category: Platform Engineering Author: chrorrala Eval version: v12 (local files: v11 — API 500'd on download) Backend: Docker (author confirmed in Discord: "I reverted to the normal docker as instructed") Reviewer: Dylan Fitzgerald

@arubis
arubis / rabbitmq-amqp-dead-fanout-review.md
Last active April 2, 2026 22:19
Task review: rabbitmq-amqp-dead-fanout (v12) — APPROVE

Task Review: rabbitmq-amqp-dead-fanout (v12)

UUID: 71d0a7da-2d8e-465e-b3f5-d2e50c6ea3bb Author: peterkay_86616 Category: cloud-ops Backend: Teapot (threshold: ≤0.85)


Verdict: APPROVE

@arubis
arubis / maddy-spf-dkim-review.md
Last active April 2, 2026 23:48
Review: maddy-spf-dkim-domain-migration (a4bc3f9c)
@arubis
arubis / alertmanager-webhook-routing-review-v10.md
Last active April 2, 2026 17:20
Review: alertmanager-webhook-routing-failure v10 (3802d778)

Review: alertmanager-webhook-routing-failure v10

Verdict: NEEDS_WORK | Mean score 0.96 (threshold: <0.50 docker backend) | 8 of 9 scored runs perfect 1.0


Why It's Too Easy

Every fault follows the same diagnostic loop:

Task Difficulty Tuning

A systematic process for adjusting AI agent task difficulty when scores are too high (task too easy) or too low (task too hard).

For acceptance criteria and pass rate thresholds, see Task Review Guide. For failure analysis methodology, see Task Eval Analysis.

When to Use

  • Task scoring above target threshold (>70% pass rate - too easy)
  • Task scoring below target threshold (0% but solution works - artificial failures)
@arubis
arubis / grader.py
Created March 30, 2026 22:11
postgres-cve-2024-7348-pg-dump-privesc — generated by bespoke-recipe-intelligence
"""Grader for postgres-cve-2024-7348-pg-dump-privesc.
Checks:
1. version_remediated — PostgreSQL image is >= 16.4 (not 16.3 or earlier)
2. search_path_hardened — POSTGRES_OPTIONS env var no longer sets unsafe search_path
3. deployment_healthy — Deployment has desired replicas ready
4. database_reachable — PostgreSQL responds to a basic health query
"""
import subprocess
@arubis
arubis / keycloak-rotation-review-v18.md
Last active March 25, 2026 20:41
Review: Keycloak OIDC Token Signing Key Rotation v18 (bd24c35b)

Review: Keycloak OIDC Token Signing Key Rotation (v18)

Task: bd24c35b-157b-400b-bcdb-88e539b2467c
Version: 18 · Category: SRE · Difficulty: hard
Verdict: NEEDS_WORK

Solution passes (1.0). Mean score 0.60 across 8 biggie-nebula runs — below 0.70. Every subscore has variance. The task is well-designed and close to approval, but has one grader defect that produces non-deterministic failures unrelated to agent skill.