Skip to content

Instantly share code, notes, and snippets.

@bigsnarfdude
Last active November 24, 2025 00:30
Show Gist options
  • Select an option

  • Save bigsnarfdude/9ffdd958473b71b21cc71aff57425952 to your computer and use it in GitHub Desktop.

Select an option

Save bigsnarfdude/9ffdd958473b71b21cc71aff57425952 to your computer and use it in GitHub Desktop.
Project Emmentaler
Inbound (wizard101/cascade)   Outbound (WIP)
  ─────────────────           ─────────────────
  User Prompt                 Model Response
      │                           │
      ▼                           ▼
  Llama Guard                 Exfil Detector
      │                           │
      ▼                           ▼
  Refuse/Allow                PII? Secrets?
                              Internal docs?
                              
                              
                              
                              
  ┌─────────────────────────────────────┐
  │           LLM doing work            │
  │  ┌───────────────────────────────┐  │
  │  │     LLM checking output       │  │
  │  │  ┌─────────────────────────┐  │  │
  │  │  │  LLM sanitizing input   │  │  │
  │  │  │  ┌───────────────────┐  │  │  │
  │  │  │  │ LLM classifying   │  │  │  │
  │  │  │  │ ┌─────────────┐   │  │  │  │
  │  │  │  │ │LLM routing  │   │  │  │  │
  │  │  │  │ └─────────────┘   │  │  │  │
  │  │  │  └───────────────────┘  │  │  │
  │  │  └─────────────────────────┘  │  │
  │  └───────────────────────────────┘  │
  └─────────────────────────────────────┘

                              
                              
  ┌─────────────────────────────────┐
  │         Presidio Layer          │
  ├─────────────────────────────────┤
  │  1. Patterns (regex)            │
  │  2. NER (spaCy)                 │
  │  3. Context enhancement         │
  │  4. Validators (Luhn checksum)  │
  │  5. Deny lists                  │
  │  6. Custom recognizers          │
  │  7. Score aggregation           │
  └─────────────────────────────────┘



┌─────────────────────────────────┐    ┌─────────────────────────────────┐
│   ADVERSARIAL EXFILTRATION      │    │   ACCIDENTAL MEMORIZATION       │
│   (Prompt Injection)            │    │   (Training Data Extraction)    │
├─────────────────────────────────┤    ├─────────────────────────────────┤
│ • Intentional attacks           │    │ • Unintended statistical leak   │
│ • Offensive security methods    │    │ • Privacy/ML theory methods     │
│ • Attack Success Rate (ASR)     │    │ • Extraction probability        │
│ • OWASP #1 LLM risk             │    │ • Larger models = MORE vulner.  │
└─────────────────────────────────┘    └─────────────────────────────────┘
              │                                       │
              └───────────────┬───────────────────────┘
                              │
                    ┌─────────▼─────────┐
                    │   RESEARCH GAP    │
                    │  <5% papers       │
                    │  address BOTH     │
                    └───────────────────┘
                    
                    
  Model Response
        │
        ▼
  ┌─────────────────┐
  │ Stage 1: Regex  │  <1ms
  │ SecretDetector  │
  └────────┬────────┘
           │
      high conf?
      (≥0.9)
      ┌────┴────┐
     yes        no
      │         │
      ▼         ▼
    BLOCK   ┌──────────────┐
            │ Stage 2: NER │  ~5ms
            │   Presidio   │
            └──────┬───────┘
                   │
              high conf?
              (≥0.9)
              ┌────┴────┐
             yes        no
              │         │
              ▼         ▼
            BLOCK   ┌──────────────┐
                    │ Stage 3:     │  ~1ms
                    │ Policy Check │
                    └──────┬───────┘
                           │
                      policy?
                      ┌────┼────┐
                   BLOCK  REDACT  ALLOW
                      │     │      │
                      ▼     ▼      ▼
                    Block  Mask   Pass
                    resp   PII   through

  Current Implementation Status

  | Stage | Component      | Status     | Latency |
  |-------|----------------|------------|---------|
  | 1     | SecretDetector | ✅ Done     | <1ms    |
  | 2     | Presidio PII   | ✅ Done     | ~5ms    |
  | 3     | Policy Engine  | ❌ Not done | -       |
  | 4     | Redactor       | ❌ Not done | -       |
  
                              ```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment