Skip to content

Instantly share code, notes, and snippets.

@arubis
arubis / chaos-grader-v10-mem-toleration-removal.patch
Created February 13, 2026 21:36
single-node-chaos-hardening v10: remove mem_toleration artificial blocker from grader
--- /tmp/task-download/single-node-chaos-hardening/grader.py 2026-02-13 13:36:32.569022191 -0700
+++ /home/dylan/dev/Nebula/tasks/single-node-chaos-hardening/grader.py 2026-02-13 14:17:17.945132597 -0700
@@ -242,9 +242,6 @@
"priority_ok": priority_value > min_priority_value,
"priority_value": priority_value,
"priority_class": pc_name,
- "mem_toleration": any(
- t.get("key") == "node.kubernetes.io/memory-pressure" for t in tolerations
- ),
"node_affinity": "nodeAffinity" in affinity,

Task Milestone Viability Analysis

Date: 2026-02-13
Authors: Dylan Fitzgerald + Claude (Opus 4.6)
Scope: Standalone tasks only (not subtasks). All numbers are total tasks including the current ~120.


Context

Task Production Feasibility & Timeline Estimate

Date: 2026-02-13
Authors: Dylan Fitzgerald + Claude (Opus 4.6)
Context: The client has ~120 accepted apex-arena tasks on the Nebula platform. They want to reach 900. This document estimates what we can deliver and in what timeframe, grounded in empirical gap analysis.


Executive Summary

@arubis
arubis / gist-description.md
Created February 13, 2026 01:41
fix: distributed-transaction-deadlock grader -- detect alert threshold in all Grafana multi-step alert formats

fix: detect alert threshold in all Grafana multi-step alert formats

Problem

check_grafana_alert_rule() in grader.py only detects the >= 5 threshold when it appears inline in the PromQL expr field (e.g. "expr": "pg_lock_wait_time_seconds > 5").

The solution.sh creates alerts this way, so test-solution passes. But agents universally create alerts using Grafana's standard multi-step format, where the threshold lives in a separate step -- not in the expr string. This causes

@arubis
arubis / chaos-engineering-resilience-review.md
Last active February 10, 2026 23:40
Task Review: chaos-engineering-resilience (v33) - Secondary Review

Task Review: chaos-engineering-resilience (v33)

Task UUID: 326ba06c-1edc-4f27-a0e3-57001213b883
Category: SRE | Difficulty: Hard | Horizon: 6h
Author: .tryps (Orestis Trypidakis)
Primary reviewer: shahryaradil (approved, promoted to secondary)
Secondary reviewer: daltoris


@arubis
arubis / chaos-hardening-remove-pdb-check.patch
Created February 10, 2026 22:17
single-node-chaos-hardening: remove PDB existence check (irrelevant to memory-pressure eviction)
--- a/tasks/single-node-chaos-hardening/grader.py
+++ b/tasks/single-node-chaos-hardening/grader.py
@@ -130,13 +130,6 @@ def validate_setup():
if node_taints and not affinity_issues_found:
feedback.append("All Tier 1/2 workloads compatible with node taints")
- # --- PDB Existence Check ---
- print("validating PDB existence")
- for ns in ["argocd", "bleater", "monitoring"]:
- rc, out = run(f"kubectl get pdb -n {ns} -o json")
@arubis
arubis / chaos-hardening-priority-ordering.patch
Created February 10, 2026 22:16
single-node-chaos-hardening: relax priority check from 3-tier to 2-tier (crucial > expendable)
--- a/tasks/single-node-chaos-hardening/grader.py
+++ b/tasks/single-node-chaos-hardening/grader.py
@@ -69,18 +69,23 @@ def validate_setup():
all_ok = False
# --- Priority Ordering Check ---
- print("validating relative priority ordering")
- t1 = get_effective_priority("bleater")
- t2 = get_effective_priority("gitea")
- t3 = get_effective_priority("loadgenerator")
@arubis
arubis / README.md
Last active February 6, 2026 00:17
Fix for kubernetes-security-hardening-zero-disruption task: wait for old pods using default SA to fully terminate before grader runs

Fix: Wait for Terminating Pods Before Grader Runs

Problem

The solution.sh for the kubernetes-security-hardening-zero-disruption task was failing the RBAC check with score 0.75 even though the solution correctly:

  1. Created a dedicated ServiceAccount (bleater-app)
  2. Patched all Deployments/StatefulSets to use the new SA
  3. Performed a rolling restart

Root Cause

@arubis
arubis / fix-annotation-discoverability.patch
Created February 5, 2026 00:37
Review: prometheus-observability-stack-failure (2084d83d) — annotation discoverability fix
--- a/setup.sh
+++ b/setup.sh
@@ -69,6 +69,16 @@
# ------------------------------------------------------------
-# Prometheus discovery label mismatch (fault 4)
+# Prometheus discovery label mismatch (fault 4 & 5)
+# Fault 4: observability label
+# Fault 5: remove app label so the bleater-services endpoint
+# discovery job can no longer match these services.
+# This forces annotation-based pod discovery to be the
@arubis
arubis / synthetic-monitoring-hardening-suggestions.md
Last active February 4, 2026 18:24
Synthetic Endpoint Monitoring - Hardening Suggestions (post-grader-fix)

Synthetic Endpoint Monitoring - Hardening Suggestions

Task ID: a6b6b25b-fbdf-4830-bd13-258c6bfd9948 Current Version: v32 Date: 2026-02-04

Context

After fixing the grader bugs (broken self-parameter methods), the task now passes test-solution with a perfect score. The agent pass rate is expected to increase with the fixed grader—it may exceed the 70% threshold, but it also might not.