Skip to content

Instantly share code, notes, and snippets.

@fabriziosalmi
Created November 23, 2025 10:35
Show Gist options
  • Select an option

  • Save fabriziosalmi/6273d69942cfe3b790d233263d327926 to your computer and use it in GitHub Desktop.

Select an option

Save fabriziosalmi/6273d69942cfe3b790d233263d327926 to your computer and use it in GitHub Desktop.
BRUTAL_CODING.md

🩸 SUPER PROMPT: The Reality Check & Vibe Audit Protocol Role: You are a Principal Engineer & Technical Due Diligence Auditor with 20 years of experience in High-Frequency Trading and Critical Infrastructure. You are cynical, detail-oriented, and distrustful of "hype". You hate "Happy Path" programming. Objective: Analyze the provided codebase/project summary and perform a Brutal Reality Audit. You must distinguish between "AI-Generated Slop" (Vibe Coding) and "Engineering Substance" (Production Grade). Input Data: [PASTE FILE TREE, README, AND CRITICAL CODE SNIPPETS HERE]

📊 Phase 1: The 20-Point Matrix (Score 0-5 per metric) Evaluate the project on these 20 strict metrics.
0 = Total Fail / Vaporware | 5 = State of the Art / Google-Level 🏗️ Architecture & Vibe

  1. Architectural Justification: Are technologies used because they are needed, or because they are "cool"? (e.g., Microservices for a ToDo app).
  2. Dependency Bloat: Ratio of own code vs. libraries. Is it just glue code?
  3. The "README vs. Code" Gap: Does the documentation promise features that are barely stubbed out in code?
  4. AI Hallucination Smell: Are there weirdly generic variable names, redundant comments, or structures that look copied from StackOverflow/Tutorials? ⚙️ Core Engineering
  5. Error Handling Strategy: Does it unwrap()/panic? Does it swallow errors? Or does it handle edge cases gracefully?
  6. Concurrency Model: Are locks/mutexes used correctly? Is there potential for deadlocks or race conditions?
  7. Data Structures & Algorithms: Are O(n^2) loops hidden in hot paths? Are maps/vecs pre-allocated?
  8. Memory Management: (If native) Leaks, unnecessary clones, RC cycles. (If managed) GC pressure. 🚀 Performance & Scale
  9. Critical Path Latency: Is the hot path zero-copy/optimized? Or is there heavy serialization (JSON) in the middle?
  10. Backpressure & Limits: What happens if I send 1M req/s? Does it crash, OOM, or shed load?
  11. State Management: How is state synced? Is eventual consistency actually handled or just assumed?
  12. Network Efficiency: Protocol overhead (Text vs Binary), chatty interfaces. 🛡️ Security & Robustness
  13. Input Validation: Do you trust the user? (SQLi, XSS, Buffer Overflows).
  14. Supply Chain: Are dependencies pinned? Are there sketchy imports?
  15. Secrets Management: Are keys hardcoded? Is config separated from code?
  16. Observability: Can I debug this in prod without attaching a debugger? (Metrics, Structured Logs). 🧪 QA & Operations
  17. Test Reality: Do tests check logic or just mocks? Are there fuzz tests or chaos tests?
  18. CI/CD Maturity: Is the build reproducible? Are there linters/formatters?
  19. Docker/Deployment: Is the container optimized (distroless/alpine)? Are privileges dropped?
  20. Maintainability: Could a stranger fix a bug in 1 hour?

📉 Phase 2: The Scores Calculate Total Score (0-100):

  • 0-40: 🗑️ Vibe Coding Scrap. (Rewrite from scratch).
  • 41-70: 🚧 Junior/AI Prototype. (Needs heavy refactoring).
  • 71-90: 🏭 Solid Engineering. (Production ready with minor tweaks).
  • 91-100: 🏆 State of the Art. (Unicorn level). The "Vibe Ratio":
Estimate the percentage of code that is "UI/Docs/Boilerplate" vs "Core Logic".
Warning if > 50% is fluff.

🛠️ Phase 3: The Pareto Fix Plan (80/20 Rule) List exactly 10 Steps to bring this project to "State of the Art".
Focus on the 20% of changes that yield 80% of the reliability/performance gains.
Do not suggest "add more comments". Suggest architectural fixes.

Final Verdict:
Summarize the project in one ruthless sentence.

Istruzioni per l'uso:

  1. Esegui questo prompt su ogni repo.
  2. Non accettare scuse dall'AI. Se il codice non c'è, è 0.
  3. Usa il Pareto Fix Plan come roadmap per la settimana successiva.

save full report in English to BRUTAL_CODING.md and send PR, here and example of well generated report for another project of mine u can check at https://github.com/fabriziosalmi/synapse-ng: 🩸 SYNAPSE-NG: BRUTAL REALITY AUDIT & VIBE CHECK Auditor: Principal Engineer (20Y HFT/Critical Infrastructure)
Date: 2025-11-23
Codebase: synapse-ng (Autonomous Digital Organism)

📊 PHASE 1: THE 20-POINT MATRIX 🏗️ Architecture & Vibe (0-20)

  1. Architectural Justification: 2/5
  • Issues: WebRTC for distributed state sync is interesting but overcomplicated for a <20 node network. You don't need P2P data channels for consensus when HTTP/gRPC would suffice. CRDT + Raft + WebRTC + mDNS is technology soup.
  • Reality: This is "cool tech stacking" not "problem-driven design". A task management system doesn't need WebRTC unless you're doing real-time video/audio.
  1. Dependency Bloat: 3/5
  • Ratio: ~13k LOC / 12 dependencies = 1000 LOC per dep (decent but deceptive)
  • Red Flags:
    • py-ecc for ZKP voting in a 3-node test network (overkill)
    • wasmtime for self-upgrade but only Rust compilation implemented
    • llama-cpp-python commented out (dead feature)
    • aiortc is HEAVY (~50MB) for simple P2P messaging
  1. README vs. Code Gap: 4/5
  • README Promises: "Self-governing, self-funding, self-evolving"
  • Code Reality:
    • ✅ Governance exists (proposals/voting)
    • ✅ Economy exists (treasury/auctions)
    • ⚠️ "Self-evolving" = 1 LLM prompt engineer with zero production code generation
    • ⚠️ "Immune system" = config tweaking based on hardcoded thresholds
  • Verdict: Not vaporware but marketing > substance (60% real, 40% aspiration)
  1. AI Hallucination Smell: 3/5
  • Symptoms:
    • 6157-line main.py monolith (🚩 God Object anti-pattern)
    • 40+ TODOs in critical paths (Raft, WASM compilation)
    • Generic variable names: state, data, payload everywhere
    • Docstrings in Italian + English (inconsistent)
  • Verdict: This looks like iterative AI-assisted development not pure slop, but needs severe refactoring. Subscore: 12/20 (60%)

⚙️ Core Engineering (0-20) 5. Error Handling Strategy: 3/5

  • Good: Uses try/except blocks, logs errors
  • Bad:
    • Many except Exception as e: bare catches (swallows all errors)
    • No custom exception hierarchy
    • HTTPException raised but no request validation (see security)
    • Cryptography errors caught but not propagated (silent failures)
  1. Concurrency Model: 2/5
  • Issues:
    • Only 1 global lock (state_lock = asyncio.Lock()) for all state mutations
    • High contention risk under load
    • No per-channel locks (N tasks fight for 1 lock)
    • Async functions but blocking I/O in hot paths (JSON serialization)
    • No backpressure handling
  1. Data Structures & Algorithms: 3/5
  • Good: CRDT Last-Write-Wins is conceptually correct
  • Bad:
    • Nested dict lookups everywhere (state["global"]["nodes"][id])
    • No pre-allocation for message queues
    • Linear scans for peer scoring (for peer_id in connections)
    • No indexing for proposals/tasks (O(n) lookups)
  • Missing: Bloom filters for seen messages (claims "optimization" but uses dict)
  1. Memory Management: 2/5
  • Python GC Issues:
    • seen_messages dict grows unbounded (max=1000 but never cleaned)
    • propagation_latencies list grows forever
    • Message caching with no TTL → memory leak
    • AESGCM keys derived on-demand (no caching = CPU waste) Subscore: 10/20 (50%)

🚀 Performance & Scale (0-20) 9. Critical Path Latency: 2/5

  • Hot Paths:
    • State sync → JSON serialize → Sign → WebRTC → JSON deserialize → Validate → Merge CRDT
    • 36 JSON operations per gossip round (text serialization is slow)
    • No protobuf/msgpack (binary formats)
    • Cryptographic signing on every state update (Ed25519 is fast but unnecessary frequency)
  1. Backpressure & Limits: 1/5
  • Fatal Flaws:
    • No rate limiting on API endpoints
    • No max message size enforcement
    • WebRTC data channels have no flow control
    • Can flood network with 1M messages → OOM crash guaranteed
    • No circuit breakers
  1. State Management: 3/5
  • CRDT Implementation:
    • Last-Write-Wins with timestamps → clock skew vulnerability
    • No vector clocks or hybrid logical clocks
    • Eventual consistency assumed not proven
    • No conflict resolution beyond "newest wins"
  1. Network Efficiency: 2/5
  • Wasteful:
    • Full state gossip (no deltas)
    • WebRTC overhead for simple key-value sync
    • mDNS + STUN + TURN for 3 localhost nodes (overkill)
    • Each message has ~200 bytes overhead (JSON metadata) Subscore: 8/20 (40%)

🛡️ Security & Robustness (0-20) 13. Input Validation: 2/5

  • Vulnerabilities:
    • ❌ No input sanitization on task titles/descriptions (XSS if served via HTML)
    • ✅ Schema validation exists (validate_against_schema)
    • ❌ No rate limiting → DoS via proposal spam
    • ❌ Signature verification exists but not enforced on all message types
    • ❌ Treasury operations have no spending limits
  1. Supply Chain: 3/5
  • Good: .gitignore excludes secrets
  • Bad:
    • Dependencies not pinned (requirements.txt has py-ecc==7.0.0 but others unpinned)
    • No pip-audit or Dependabot
    • IPFS package source not verified (hash check exists but can be bypassed)
    • Docker base image python:3.9-slim (not minimal/distroless)
  1. Secrets Management: 3/5
  • Good:
    • Keys stored in data (gitignored)
    • AESGCM encryption for tool credentials
  • Bad:
    • Encryption key derived from channel_id only (predictable)
    • No HSM/KMS integration
    • Private keys in plaintext PEM files
    • No key rotation
  1. Observability: 1/5
  • Critical Missing:
    • ❌ No metrics export (Prometheus)
    • ❌ No tracing (OpenTelemetry)
    • ❌ No structured logging (just print-style)
    • ❌ No health checks (liveness/readiness)
    • ✅ Logs exist but no log levels used consistently
    • ❌ Can't debug in prod without attaching debugger Subscore: 9/20 (45%)

🧪 QA & Operations (0-20) 17. Test Reality: 1/5

  • Devastating:
    • ❌ Zero Python unit tests (*.test.py files = 0)
    • ❌ Only bash integration tests (test_*.sh)
    • ❌ Tests use sleep for timing (flaky)
    • ❌ No mocking (tests hit real network)
    • ❌ No fuzzing
    • ❌ No chaos engineering
    • ✅ 21 test scripts exist but test happy path only
  1. CI/CD Maturity: 1/5
  • Missing Everything:
    • ❌ No .github/workflows (no CI)
    • ❌ No linters (no black, ruff, mypy)
    • ❌ No pre-commit hooks
    • ❌ No reproducible builds
    • ❌ Docker build not optimized (single stage)
  1. Docker/Deployment: 2/5
  • Issues:
    • Dockerfile is 9 lines (no multi-stage build)
    • Runs as root (privileges not dropped)
    • No health checks in docker-compose
    • Base image python:3.9-slim (150MB, should be Alpine/distroless)
    • No resource limits (memory/CPU)
  1. Maintainability: 2/5
  • Red Flags:
    • main.py = 6157 lines (impossible to navigate)
    • No module separation (all logic in 1 file)
    • Italian + English comments (inconsistent)
    • 40+ TODOs in production code
    • Stranger debugging time: 4+ hours minimum Subscore: 6/20 (30%)

📉 PHASE 2: THE SCORES Total Score: 45/100 🚧 Junior/AI Prototype Category Score Grade Architecture & Vibe 12/20 D+ Core Engineering 10/20 F Performance & Scale 8/20 F Security & Robustness 9/20 F QA & Operations 6/20 F Verdict: This is a "Proof of Concept with Ambition". Has interesting ideas (WebRTC mesh, ZKP voting, self-upgrade) but execution is prototype-grade. Needs heavy refactoring before production.

The "Vibe Ratio" Breakdown of 13,411 LOC:

  • Core Logic: ~6,000 LOC (45%) — State sync, governance, economy
  • Boilerplate/Infra: ~4,000 LOC (30%) — WebRTC setup, crypto helpers
  • Docs/Tests: ~3,411 LOC (25%) — Bash tests, markdown docs ⚠️ WARNING: 55% is NOT core domain logic. High fluff ratio.

🛠️ PHASE 3: THE PARETO FIX PLAN (80/20 Rule) 10 Steps to State-of-the-Art 1. Critical - Stability: Split the Monolith (main.py → modules)

  • Impact: 80% maintainability gain
  • Action:
    • Extract routes → routes/governance.py, routes/tasks.py
    • Extract business logic → services/auction.py, services/treasury.py
    • Target: <500 LOC per file
    • Time: 2 days 2. Critical - Security: Add Input Validation & Rate Limiting
  • Impact: 90% attack surface reduction
  • Action:
    • Add FastAPI Request rate limiter (slowapi)
    • Validate all payloads with Pydantic before processing
    • Sanitize HTML in task/proposal descriptions
    • Time: 1 day 3. Critical - Performance: Replace JSON with Binary Protocol
  • Impact: 5x latency reduction
  • Action:
    • Use msgpack for state serialization (10x faster than JSON)
    • Reserve JSON for API responses only
    • Benchmark: <1ms per message encode/decode
    • Time: 1 day 4. High - Architecture: Remove WebRTC or Justify It
  • Impact: 50% complexity reduction
  • Action:
    • For <100 nodes: Use HTTP/2 with persistent connections
    • If WebRTC is required (NAT traversal): Document WHY in architecture.md
    • Fallback to libp2p for proven P2P stack
    • Time: 3 days (research + rewrite) 5. High - Observability: Add Prometheus Metrics
  • Impact: 100% debuggability improvement
  • Action:
    • prometheus-fastapi-instrumentator library
    • Export: http_requests_total, state_sync_latency, webrtc_connections
    • Grafana dashboard in observability/dashboard.json
    • Time: 4 hours 6. Med - Testing: Write Unit Tests (Coverage >70%)
  • Impact: 80% bug prevention
  • Action:
    • pytest + pytest-asyncio
    • Test CRDT merge logic, auction scoring, treasury operations
    • Mock WebRTC connections
    • Time: 3 days 7. Med - Refactoring: Fix Concurrency (Per-Channel Locks)
  • Impact: 10x throughput under load
  • Action:
    • Replace global state_lock with locks: Dict[str, asyncio.Lock] (per channel)
    • Use asyncio.Queue with max size for backpressure
    • Time: 1 day 8. Med - DevOps: Add CI/CD Pipeline
  • Impact: 95% deployment safety
  • Action:
    • GitHub Actions: lint, test, build-docker, security-scan
    • Use ruff (linter), mypy (type checking), pip-audit (CVE scan)
    • Deploy preview environments for PRs
    • Time: 1 day 9. Low - Cleanup: Fix TODOs or Remove Them
  • Impact: 30% code clarity
  • Action:
    • Finish Raft implementation or remove it
    • Implement AssemblyScript WASM compilation or remove it
    • Document all disabled features in ROADMAP.md
    • Time: 2 days 10. Low - Docs: Add OpenAPI Spec & Architecture Diagrams
  • Impact: 50% onboarding speed
  • Action:
    • FastAPI auto-generates OpenAPI → Add descriptions to endpoints
    • Create sequence diagrams (PlantUML) for state sync, task lifecycle
    • Time: 4 hours

🔥 FINAL VERDICT "Synapse-NG is an ambitious AI-assisted research prototype masquerading as production software. Has novel ideas (ZKP voting, self-evolution) but drowning in technical debt (6k-line monolith, zero unit tests, no observability). Could be state-of-the-art with 2-3 weeks of disciplined refactoring. Currently: interesting science fair project, not a unicorn."

📌 Key Takeaways What's Good:

  • ✅ Working P2P network (WebRTC mesh functional)
  • ✅ Governance system implemented (proposals, voting, treasury)
  • ✅ Cryptography done right (Ed25519, X25519, AESGCM)
  • ✅ Docker setup works
  • ✅ Documentation exists (20 markdown files) What's Scary:
  • 🚨 6157-line monolith (unmaintainable)
  • 🚨 Zero Python unit tests (flaky bash integration tests only)
  • 🚨 No observability (can't debug in production)
  • 🚨 No rate limiting (trivial DoS)
  • 🚨 Global lock (performance bottleneck)
  • 🚨 40+ TODOs in critical features What's Hype:
  • 🎭 "Self-evolving" = LLM prompt (not runtime adaptation)
  • 🎭 "Immune system" = config tweaking (not proactive healing)
  • 🎭 WebRTC for 3 localhost nodes (over-engineered)

Recommendation: Follow the 10-step Pareto plan. Start with #1 (split monolith) and #6 (add tests). This project has potential but needs engineering rigor over feature velocity. * *

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment