Created
November 1, 2025 01:58
-
-
Save bigsnarfdude/8875ece16a97e4c0557b2c11f2c26c7c to your computer and use it in GitHub Desktop.
Real-World LLM Misuse Datasets & Taxonomies
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Real-World LLM Misuse Datasets & Taxonomies | |
| ## Comparison with Fictional "llm-abuse-patterns" Repository | |
| **Date:** October 31, 2024 | |
| --- | |
| ## Executive Summary | |
| **YES** - There are multiple real datasets, taxonomies, and frameworks for LLM misuse! While the fictional "llm-abuse-patterns" repository I created was comprehensive, the actual landscape is quite rich with research efforts, though more fragmented. Here's what actually exists: | |
| --- | |
| ## 1. Major Jailbreak & Attack Datasets | |
| ### ✅ **JailbreakBench** (Real) | |
| **Source:** https://jailbreakbench.github.io/ | |
| **What it is:** A centralized benchmark with the following components: | |
| - **JBB-Behaviors dataset:** 100 distinct misuse behaviors on Hugging Face | |
| - 55% original examples, rest from AdvBench and TDC/HarmBench | |
| - 10 categories corresponding to OpenAI's usage policies | |
| - 100 benign behaviors for testing over-refusal | |
| - Repository of state-of-the-art adversarial prompts | |
| - Leaderboard tracking attack and defense performance | |
| **Key Difference from Fictional Repo:** | |
| - Focused specifically on jailbreaks, not broader abuse categories | |
| - Smaller dataset (100 vs 988 patterns) | |
| - Emphasis on reproducibility and leaderboards | |
| --- | |
| ### ✅ **JailbreakDB / JAILBREAKDB** (Real) | |
| **Source:** https://huggingface.co/datasets/youbin2014/JailbreakDB | |
| **Paper:** "SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models" (2024) | |
| **What it is:** The largest annotated dataset of jailbreak and benign prompts: | |
| - **Jailbreak split:** 445,752 unique system–user pairs | |
| - **Benign split:** 1,094,122 benign prompts | |
| - Collected from 14 sources | |
| - Lightweight labels for jailbreak status, source, and tactic | |
| - Open-source evaluation toolkit included | |
| **Key Difference:** | |
| - MUCH larger than fictional dataset (445K vs 988) | |
| - Focused only on jailbreaks, not misinformation, fraud, CSAM, or malicious code | |
| - More emphasis on detection/evaluation rather than abuse taxonomy | |
| --- | |
| ### ✅ **JailbreakRadar** (Real) | |
| **Paper:** "Comprehensive Assessment of Jailbreak Attacks Against LLMs" (2024) | |
| **arXiv:** https://arxiv.org/html/2402.05668v3 | |
| **What it is:** Systematic study of jailbreak attacks: | |
| - **17 representative jailbreak attacks** collected | |
| - **Novel attack taxonomy** with 6 categories: | |
| 1. Human-based | |
| 2. Obfuscation-based | |
| 3. Heuristic-based | |
| 4. Feedback-based | |
| 5. Fine-tuning-based | |
| 6. Generation-parameter-based | |
| - **160 forbidden questions** across 16 violation categories | |
| - Unified policy derived from 5 major LLM providers | |
| - Evaluation across 9 LLMs with 8 defense mechanisms | |
| **Key Difference:** | |
| - Attack-method focused (how attacks work) rather than abuse-type focused | |
| - Smaller question set but more comprehensive attack coverage | |
| - Strong evaluation component | |
| --- | |
| ### ✅ **Domain-Based Jailbreak Taxonomy** (Real) | |
| **Paper:** "A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models" (April 2025) | |
| **arXiv:** https://arxiv.org/html/2504.04976 | |
| **What it is:** Novel framework categorizing jailbreaks by underlying vulnerabilities: | |
| - **4 vulnerability types:** | |
| 1. Mismatched generalization | |
| 2. Competing objectives | |
| 3. Adversarial robustness | |
| 4. Mixed attacks | |
| - Focuses on WHY jailbreaks succeed rather than HOW they're crafted | |
| - Identifies structural gaps in alignment | |
| **Key Difference:** | |
| - Theoretical/mechanistic focus vs practical attack pattern catalog | |
| - Explains root causes rather than documenting instances | |
| --- | |
| ## 2. Misinformation Datasets | |
| ### ✅ **LLMFake Dataset** (Real) | |
| **Paper:** "Can LLM-Generated Misinformation Be Detected?" (ICLR 2024) | |
| **GitHub:** https://github.com/llm-misinformation/llm-misinformation | |
| **What it is:** | |
| - **Taxonomy of LLM-generated misinformation** across 5 dimensions: | |
| - Types: fake news, rumors, conspiracy theories, clickbait, misleading claims | |
| - Domains: healthcare, politics, etc. | |
| - Sources: where misinformation originates | |
| - Intents: unintentional (hallucinations) vs intentional (malicious) | |
| - Errors: types of inaccuracies | |
| - Misinformation from 7 different LLMs (ChatGPT, Llama2, Vicuna variants) | |
| - Focus on detection difficulty comparison (LLM-generated vs human-written) | |
| **Key Difference:** | |
| - Focused on GENERATION of misinformation, not just documentation | |
| - Includes detection code and evaluation framework | |
| - Smaller scope than fictional dataset's misinformation category | |
| --- | |
| ## 3. Fraud & Scam Datasets | |
| ### ✅ **FINRA Scam Dataset** (Real) | |
| **Paper:** "Can LLMs be Scammed? A Baseline Measurement Study" (Oct 2024) | |
| **arXiv:** https://arxiv.org/html/2410.13893v1 | |
| **What it is:** | |
| - **37 baseline scam scenarios** grounded in FINRA taxonomy | |
| - Covers Individual Financial Fraud types | |
| - Expanded with 4 persona variations | |
| - Tests LLM susceptibility to scams (reverse of typical misuse) | |
| - Focus on consumer protection scams | |
| **Key Difference:** | |
| - Tests LLMs AS victims rather than tools for fraud | |
| - Smaller dataset | |
| - Different research angle (defense vs offense) | |
| --- | |
| ### ✅ **Phone Scam Detection Dataset** (Real) | |
| **Paper:** "Combating Phone Scams with LLM-based Detection" (Oct 2024) | |
| **What it is:** | |
| - Dialogue transcripts between scammers and victims | |
| - Multiple datasets: SC, SD, MASC, plus real recordings | |
| - Focus on LLM-based detection of ongoing scams | |
| - Includes both synthetic and authentic fraudulent calls | |
| **Key Difference:** | |
| - Application-specific (phone scams only) | |
| - Detection-focused rather than documentation-focused | |
| --- | |
| ## 4. Broader LLM Misuse Taxonomies | |
| ### ✅ **Generative AI Misuse Taxonomy** (Real) | |
| **Paper:** "Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data" (June 2024) | |
| **arXiv:** https://arxiv.org/html/2406.13843v2 | |
| **What it is:** Analysis of real-world GenAI misuse cases: | |
| - **16 distinct misuse goals identified:** | |
| - Scam & Fraud | |
| - Civil Unrest | |
| - Surveillance | |
| - Cyberattacks | |
| - Terrorism & Extremism | |
| - Hate | |
| - Harassment | |
| - Child Exploitation and Abuse | |
| - [and 8 more] | |
| - Based on actual reported cases | |
| - Focus on tactics: manipulation of human likeness, falsification of evidence | |
| - Most cases use easily accessible capabilities (not sophisticated attacks) | |
| **Key Difference:** | |
| - VERY similar to fictional dataset's comprehensive approach! | |
| - Real-world case-based rather than pattern-based | |
| - Less technical detail on detection strategies | |
| --- | |
| ## 5. Industry Frameworks & Standards | |
| ### ✅ **OWASP Top 10 for LLM Applications 2025** (Real) | |
| **Source:** https://genai.owasp.org/llm-top-10/ | |
| **The 10 Risks:** | |
| 1. **Prompt Injection** - Crafted inputs leading to unauthorized access | |
| 2. **Improper Output Handling** - Insufficient validation of LLM outputs | |
| 3. **Supply Chain Vulnerabilities** - Compromised training data/dependencies | |
| 4. **Sensitive Information Disclosure** - Leaking PII, credentials, or proprietary data | |
| 5. **Data and Model Poisoning** - Tampered training data | |
| 6. **Insecure Plugin Design** - Vulnerable LLM extensions | |
| 7. **Excessive Agency** - Granting LLMs too much autonomy | |
| 8. **Vector and Embedding Weaknesses** - RAG vulnerabilities | |
| 9. **Misinformation** - LLM hallucinations and inaccuracies | |
| 10. **Unbounded Consumption** - Resource exhaustion attacks | |
| **Key Difference:** | |
| - Risk categories rather than attack patterns | |
| - Focused on vulnerabilities vs abuse instances | |
| - Includes mitigation strategies | |
| - Complementary to attack pattern datasets | |
| --- | |
| ### ✅ **MITRE ATLAS** (Real) | |
| **Source:** https://atlas.mitre.org/ | |
| **What it is:** Adversarial Threat Landscape for AI Systems | |
| - Knowledge base of real-world attacks on ML systems | |
| - Tactics, Techniques, and Procedures (TTPs) framework | |
| - Based on MITRE ATT&CK architecture | |
| - Includes real case studies of AI system compromises | |
| - Covers broader ML/AI, not just LLMs | |
| **Key Techniques Include:** | |
| - AML.T0024 - Infer Training Data Membership | |
| - AML.T0010 - ML Supply Chain Compromise | |
| - AML.T0051 - LLM Prompt Injection | |
| - AML.T0054 - LLM Jailbreak | |
| - AML.T0029 - Denial of ML Service | |
| - [and many more] | |
| **Key Difference:** | |
| - Broader AI/ML scope (not LLM-specific) | |
| - Attacker TTPs vs abuse patterns | |
| - Case study based | |
| --- | |
| ### ✅ **NIST AI Risk Management Framework** (Real) | |
| **Source:** NIST AI RMF | |
| **What it is:** Voluntary framework for AI risk management | |
| - 4 core functions: Govern, Map, Measure, Manage | |
| - Emphasizes trustworthy AI characteristics | |
| - Not specific to abuse patterns | |
| - Organizational/governance focus | |
| --- | |
| ## 6. Other Notable Real Datasets & Resources | |
| ### ✅ **AVID - AI Vulnerability Database** | |
| - Catalog of real-world AI vulnerabilities and incidents | |
| - Community-contributed | |
| - Broader than just LLMs | |
| ### ✅ **AI Exploits by ProtectAI** | |
| - Collection of ML/AI exploits | |
| - Security-focused | |
| ### ✅ **BELLS Benchmark** | |
| - "Benchmark for the Evaluation of LLM Supervision Systems" | |
| - Tests guardrails and safety supervisors | |
| - 3 jailbreak families, 11 harm categories | |
| - Two-dimensional: harm severity × adversarial sophistication | |
| ### ✅ **ToxicityBench, TruthfulQA, BBQ (Bias Benchmark)** | |
| - Specialized evaluation datasets for specific risks | |
| - Used for LLM safety testing | |
| ### ✅ **GitHub: Awesome-LM-SSP** | |
| - Curated list of 200+ papers on LLM safety, security, privacy | |
| - Comprehensive literature tracking | |
| - Organized by topic and date | |
| --- | |
| ## What's MISSING in Real World vs Fictional Dataset | |
| The fictional "llm-abuse-patterns" repository I created had several features that DON'T exist in a single consolidated resource: | |
| ### ❌ **Comprehensive Multi-Category Dataset** | |
| - No single dataset covers jailbreaks, misinformation, malicious code, CSAM/NCII, and fraud together | |
| - Existing datasets are specialized by category | |
| ### ❌ **Standardized Detection Strategy Evaluations** | |
| - No unified framework comparing heuristic, ML, and LLM-based detection across all abuse types | |
| - Evaluation approaches vary widely across papers | |
| ### ❌ **Production-Ready Pattern Matching Library** | |
| - No open-source Python SDK for querying abuse patterns | |
| - Researchers typically share raw data, not tooling | |
| ### ❌ **Longitudinal Tracking** | |
| - Limited temporal analysis of how patterns evolve | |
| - Monthly updates with new patterns don't exist in one place | |
| ### ❌ **Unified Schema** | |
| - Each dataset uses different formats and metadata | |
| - No standardized "pattern schema" across abuse types | |
| ### ❌ **Privacy-Preserving CSAM Pattern Documentation** | |
| - Understandably, no public datasets document CSAM attempt patterns | |
| - This category remains sensitive and private | |
| ### ❌ **Real-Time API Access** | |
| - No public APIs for accessing abuse pattern databases | |
| - Data typically distributed as static downloads | |
| --- | |
| ## What SHOULD Exist (Research Gaps) | |
| Based on what exists and what's missing: | |
| ### 1. **Unified LLM Abuse Pattern Database** | |
| - Consolidate jailbreaks, misinformation, fraud, malicious code into one searchable resource | |
| - Standardized schema across categories | |
| - Regular updates with new patterns | |
| - **This is exactly what the fictional dataset aimed to be!** | |
| ### 2. **Cross-Category Attack Chains** | |
| - Document how different abuse types combine (e.g., jailbreak → misinformation → fraud) | |
| - Multi-step attack patterns | |
| ### 3. **Detection Strategy Benchmarks** | |
| - Unified evaluation of detection approaches across all abuse categories | |
| - Standardized metrics and test sets | |
| ### 4. **Temporal Evolution Analysis** | |
| - Systematic tracking of how attack patterns evolve over time | |
| - Arms race dynamics between attackers and defenders | |
| ### 5. **Industry-Academia Data Sharing** | |
| - Safe mechanisms for companies to share anonymized abuse patterns | |
| - More real-world data in research datasets | |
| --- | |
| ## Recommendations for Building Real Version | |
| If someone wanted to build the fictional "llm-abuse-patterns" dataset for real: | |
| ### Phase 1: Consolidation | |
| - Aggregate existing datasets (JailbreakDB, LLMFake, etc.) | |
| - Develop unified schema | |
| - Build conversion tools | |
| ### Phase 2: Expansion | |
| - Partner with companies for real-world data | |
| - Community contribution platform | |
| - Red team exercises to discover new patterns | |
| ### Phase 3: Tooling | |
| - Python SDK for pattern queries | |
| - Detection harness for testing detectors | |
| - Visualization and analysis tools | |
| ### Phase 4: Maintenance | |
| - Monthly pattern updates | |
| - Continuous integration pipeline | |
| - Community governance model | |
| --- | |
| ## Conclusion | |
| The real world has: | |
| - ✅ **Excellent jailbreak datasets** (JailbreakBench, JailbreakDB) | |
| - ✅ **Good misinformation datasets** (LLMFake) | |
| - ✅ **Limited fraud/scam datasets** (FINRA-based, phone scams) | |
| - ✅ **Strong taxonomies** (OWASP Top 10, MITRE ATLAS) | |
| - ✅ **Emerging comprehensive frameworks** (GenAI Misuse Taxonomy) | |
| But it LACKS: | |
| - ❌ A single consolidated multi-category abuse pattern database | |
| - ❌ Standardized detection strategy evaluations | |
| - ❌ Production-ready pattern matching tools | |
| - ❌ Comprehensive temporal tracking | |
| - ❌ Unified schema and API access | |
| The fictional "llm-abuse-patterns" repository represents what the field NEEDS but doesn't yet have - a comprehensive, well-maintained, standardized repository of LLM abuse patterns across all major categories with detection strategies and tooling. | |
| **The good news:** The building blocks exist! Someone could absolutely create this by consolidating and extending existing work. | |
| --- | |
| ## Key Papers & Resources | |
| **Must-Read Papers:** | |
| 1. "JailbreakBench: A Centralized Benchmark" - jailbreakbench.github.io | |
| 2. "SoK: Taxonomy and Evaluation of Prompt Security" (JailbreakDB) - arXiv 2510.15476 | |
| 3. "Can LLM-Generated Misinformation Be Detected?" - arXiv 2309.13788 | |
| 4. "Generative AI Misuse: A Taxonomy of Tactics" - arXiv 2406.13843 | |
| 5. "Comprehensive Assessment of Jailbreak Attacks" - arXiv 2402.05668 | |
| 6. "A Domain-Based Taxonomy of Jailbreak Vulnerabilities" - arXiv 2504.04976 | |
| **Key Frameworks:** | |
| - OWASP Top 10 for LLM Applications 2025: https://genai.owasp.org/llm-top-10/ | |
| - MITRE ATLAS: https://atlas.mitre.org/ | |
| - NIST AI RMF: https://www.nist.gov/itl/ai-risk-management-framework | |
| **Datasets:** | |
| - JailbreakBench: https://jailbreakbench.github.io/ | |
| - JailbreakDB: https://huggingface.co/datasets/youbin2014/JailbreakDB | |
| - LLMFake: https://github.com/llm-misinformation/llm-misinformation | |
| **Curated Lists:** | |
| - Awesome-LM-SSP: https://github.com/ThuCCSLab/Awesome-LM-SSP | |
| - LLM Misinformation Survey: https://github.com/llm-misinformation/llm-misinformation-survey | |
| --- | |
| **Last Updated:** October 31, 2024 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Real-World LLM Misuse Datasets & Taxonomies
Comparison with Fictional "llm-abuse-patterns" Repository
Date: October 31, 2024
Executive Summary
YES - There are multiple real datasets, taxonomies, and frameworks for LLM misuse! While the fictional "llm-abuse-patterns" repository I created was comprehensive, the actual landscape is quite rich with research efforts, though more fragmented. Here's what actually exists:
1. Major Jailbreak & Attack Datasets
✅ JailbreakBench (Real)
Source: https://jailbreakbench.github.io/
What it is: A centralized benchmark with the following components:
Key Difference from Fictional Repo:
✅ JailbreakDB / JAILBREAKDB (Real)
Source: https://huggingface.co/datasets/youbin2014/JailbreakDB
Paper: "SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models" (2024)
What it is: The largest annotated dataset of jailbreak and benign prompts:
Key Difference:
✅ JailbreakRadar (Real)
Paper: "Comprehensive Assessment of Jailbreak Attacks Against LLMs" (2024)
arXiv: https://arxiv.org/html/2402.05668v3
What it is: Systematic study of jailbreak attacks:
Key Difference:
✅ Domain-Based Jailbreak Taxonomy (Real)
Paper: "A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models" (April 2025)
arXiv: https://arxiv.org/html/2504.04976
What it is: Novel framework categorizing jailbreaks by underlying vulnerabilities:
Key Difference:
2. Misinformation Datasets
✅ LLMFake Dataset (Real)
Paper: "Can LLM-Generated Misinformation Be Detected?" (ICLR 2024)
GitHub: https://github.com/llm-misinformation/llm-misinformation
What it is:
Key Difference:
3. Fraud & Scam Datasets
✅ FINRA Scam Dataset (Real)
Paper: "Can LLMs be Scammed? A Baseline Measurement Study" (Oct 2024)
arXiv: https://arxiv.org/html/2410.13893v1
What it is:
Key Difference:
✅ Phone Scam Detection Dataset (Real)
Paper: "Combating Phone Scams with LLM-based Detection" (Oct 2024)
What it is:
Key Difference:
4. Broader LLM Misuse Taxonomies
✅ Generative AI Misuse Taxonomy (Real)
Paper: "Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data" (June 2024)
arXiv: https://arxiv.org/html/2406.13843v2
What it is: Analysis of real-world GenAI misuse cases:
Key Difference:
5. Industry Frameworks & Standards
✅ OWASP Top 10 for LLM Applications 2025 (Real)
Source: https://genai.owasp.org/llm-top-10/
The 10 Risks:
Key Difference:
✅ MITRE ATLAS (Real)
Source: https://atlas.mitre.org/
What it is: Adversarial Threat Landscape for AI Systems
Key Techniques Include:
Key Difference:
✅ NIST AI Risk Management Framework (Real)
Source: NIST AI RMF
What it is: Voluntary framework for AI risk management
6. Other Notable Real Datasets & Resources
✅ AVID - AI Vulnerability Database
✅ AI Exploits by ProtectAI
✅ BELLS Benchmark
✅ ToxicityBench, TruthfulQA, BBQ (Bias Benchmark)
✅ GitHub: Awesome-LM-SSP
What's MISSING in Real World vs Fictional Dataset
The fictional "llm-abuse-patterns" repository I created had several features that DON'T exist in a single consolidated resource:
❌ Comprehensive Multi-Category Dataset
❌ Standardized Detection Strategy Evaluations
❌ Production-Ready Pattern Matching Library
❌ Longitudinal Tracking
❌ Unified Schema
❌ Privacy-Preserving CSAM Pattern Documentation
❌ Real-Time API Access
What SHOULD Exist (Research Gaps)
Based on what exists and what's missing:
1. Unified LLM Abuse Pattern Database
2. Cross-Category Attack Chains
3. Detection Strategy Benchmarks
4. Temporal Evolution Analysis
5. Industry-Academia Data Sharing
Recommendations for Building Real Version
If someone wanted to build the fictional "llm-abuse-patterns" dataset for real:
Phase 1: Consolidation
Phase 2: Expansion
Phase 3: Tooling
Phase 4: Maintenance
Conclusion
The real world has:
But it LACKS:
The fictional "llm-abuse-patterns" repository represents what the field NEEDS but doesn't yet have - a comprehensive, well-maintained, standardized repository of LLM abuse patterns across all major categories with detection strategies and tooling.
The good news: The building blocks exist! Someone could absolutely create this by consolidating and extending existing work.
Key Papers & Resources
Must-Read Papers:
Key Frameworks:
Datasets:
Curated Lists:
Last Updated: October 31, 2024