Inbound (wizard101/cascade) Outbound (WIP)
───────────────── ─────────────────
User Prompt Model Response
│ │
▼ ▼
Llama Guard Exfil Detector
│ │
▼ ▼
Refuse/Allow PII? Secrets?
Internal docs?
┌─────────────────────────────────────┐
│ LLM doing work │
│ ┌───────────────────────────────┐ │
│ │ LLM checking output │ │
│ │ ┌─────────────────────────┐ │ │
│ │ │ LLM sanitizing input │ │ │
│ │ │ ┌───────────────────┐ │ │ │
│ │ │ │ LLM classifying │ │ │ │
│ │ │ │ ┌─────────────┐ │ │ │ │
│ │ │ │ │LLM routing │ │ │ │ │
│ │ │ │ └─────────────┘ │ │ │ │
│ │ │ └───────────────────┘ │ │ │
│ │ └─────────────────────────┘ │ │
│ └───────────────────────────────┘ │
└─────────────────────────────────────┘
┌─────────────────────────────────┐
│ Presidio Layer │
├─────────────────────────────────┤
│ 1. Patterns (regex) │
│ 2. NER (spaCy) │
│ 3. Context enhancement │
│ 4. Validators (Luhn checksum) │
│ 5. Deny lists │
│ 6. Custom recognizers │
│ 7. Score aggregation │
└─────────────────────────────────┘
┌─────────────────────────────────┐ ┌─────────────────────────────────┐
│ ADVERSARIAL EXFILTRATION │ │ ACCIDENTAL MEMORIZATION │
│ (Prompt Injection) │ │ (Training Data Extraction) │
├─────────────────────────────────┤ ├─────────────────────────────────┤
│ • Intentional attacks │ │ • Unintended statistical leak │
│ • Offensive security methods │ │ • Privacy/ML theory methods │
│ • Attack Success Rate (ASR) │ │ • Extraction probability │
│ • OWASP #1 LLM risk │ │ • Larger models = MORE vulner. │
└─────────────────────────────────┘ └─────────────────────────────────┘
│ │
└───────────────┬───────────────────────┘
│
┌─────────▼─────────┐
│ RESEARCH GAP │
│ <5% papers │
│ address BOTH │
└───────────────────┘
Model Response
│
▼
┌─────────────────┐
│ Stage 1: Regex │ <1ms
│ SecretDetector │
└────────┬────────┘
│
high conf?
(≥0.9)
┌────┴────┐
yes no
│ │
▼ ▼
BLOCK ┌──────────────┐
│ Stage 2: NER │ ~5ms
│ Presidio │
└──────┬───────┘
│
high conf?
(≥0.9)
┌────┴────┐
yes no
│ │
▼ ▼
BLOCK ┌──────────────┐
│ Stage 3: │ ~1ms
│ Policy Check │
└──────┬───────┘
│
policy?
┌────┼────┐
BLOCK REDACT ALLOW
│ │ │
▼ ▼ ▼
Block Mask Pass
resp PII through
Current Implementation Status
| Stage | Component | Status | Latency |
|-------|----------------|------------|---------|
| 1 | SecretDetector | ✅ Done | <1ms |
| 2 | Presidio PII | ✅ Done | ~5ms |
| 3 | Policy Engine | ❌ Not done | - |
| 4 | Redactor | ❌ Not done | - |
```
Last active
November 24, 2025 00:30
-
-
Save bigsnarfdude/9ffdd958473b71b21cc71aff57425952 to your computer and use it in GitHub Desktop.
Project Emmentaler
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment