🎯 EDGE-CASES.md

Should You Include Edge Cases Early in Fine-Tuning?

This doc helps you decide when and how to include edge cases or rare patterns in your fine-tuning dataset.

✅ Start With Consistency

Why?

Easier to debug
Faster convergence
Avoids overfitting to rare structures

When?

Dataset < 10k examples
Narrow domain or early in development
You need rapid iteration with interpretable results

flowchart TD
    A[Start Fine-Tuning] --> B[Use Only Consistent Data]
    B --> C{Dataset Stable?}
    C -- Yes --> D[Evaluate and Analyze Failures]
    C -- No --> B

🧠 When to Include Edge Cases

Why?

Avoid model blind spots
Improve real-world robustness
Generalize to fuzzier phrasing or fringe inputs

When?

After a first stable baseline
Dataset is larger (> 50k examples)
You’re preparing for production use
You have known critical failure modes (e.g. medical, legal, fintech)

flowchart TD
    D[Stable Model] --> E[Introduce Edge Cases]
    E --> F[Expand to Ambiguous/Complex Inputs]
    F --> G[Measure Improvement & Robustness]

📐 Strategy: Layered Curriculum

Phase 1: High-consistency examples only
Phase 2: Add minor variation (synonyms, rewordings)
Phase 3: Inject edge cases and ambiguity progressively

graph LR
    A1[Phase 1: Consistency] --> A2[Phase 2: Variation]
    A2 --> A3[Phase 3: Edge Robustness]

📋 Edge Case Inclusion Checklist

Criteria	Score (0–3)	Notes
Is it common in production?
Would failure be critical?		(e.g. medical, safety, legal)
Does it teach generalization?
Is the model currently failing?		Based on logs/tests
Can it be grouped & scaled?		Part of a broader pattern?

➡️ Include if total score ≥ 8

✂️ What to Hold Back

Ultra rare phrasing with no production impact
Contrived or noisy edge cases with little structure
Very ambiguous prompts without clear resolution

These can be added later for robustness testing or eval splits.

🧮 Sample Scoring Script

import json

def score_example(example):
    score = 0
    notes = []

    if example.get("common_in_production"):
        score += 2
        notes.append("Seen in prod")
    if example.get("failure_is_critical"):
        score += 3
        notes.append("Critical failure")
    if example.get("teaches_generalization"):
        score += 2
        notes.append("Good generalization")
    if example.get("model_fails_here"):
        score += 2
        notes.append("Current weak spot")
    if example.get("is_scalable_pattern"):
        score += 1
        notes.append("Scalable pattern")

    return score, notes

with open("dataset.jsonl") as f:
    for line in f:
        item = json.loads(line)
        score, notes = score_example(item)
        if score >= 8:
            print(f"INCLUDE ({score}): {item['prompt']} // Notes: {', '.join(notes)}")
        else:
            print(f"HOLD ({score}): {item['prompt']} // Notes: {', '.join(notes)}")

Summary

First, teach the model what "good" looks like. Then, stretch it toward robustness.

Consistency → Variation → Edge Robustness.

decagondev/EDGE-CASES.md

🎯 EDGE-CASES.md

Should You Include Edge Cases Early in Fine-Tuning?

✅ Start With Consistency

Why?

When?

🧠 When to Include Edge Cases

Why?

When?

📐 Strategy: Layered Curriculum

📋 Edge Case Inclusion Checklist

✂️ What to Hold Back

🧮 Sample Scoring Script

Summary