Skip to content

Instantly share code, notes, and snippets.

@decagondev
Created April 22, 2025 15:47
Show Gist options
  • Save decagondev/205ff43cd8b2fc2c601f8a988e137824 to your computer and use it in GitHub Desktop.
Save decagondev/205ff43cd8b2fc2c601f8a988e137824 to your computer and use it in GitHub Desktop.

๐ŸŽฏ EDGE-CASES.md

Should You Include Edge Cases Early in Fine-Tuning?

This doc helps you decide when and how to include edge cases or rare patterns in your fine-tuning dataset.


โœ… Start With Consistency

Why?

  • Easier to debug
  • Faster convergence
  • Avoids overfitting to rare structures

When?

  • Dataset < 10k examples
  • Narrow domain or early in development
  • You need rapid iteration with interpretable results
flowchart TD
    A[Start Fine-Tuning] --> B[Use Only Consistent Data]
    B --> C{Dataset Stable?}
    C -- Yes --> D[Evaluate and Analyze Failures]
    C -- No --> B
Loading

๐Ÿง  When to Include Edge Cases

Why?

  • Avoid model blind spots
  • Improve real-world robustness
  • Generalize to fuzzier phrasing or fringe inputs

When?

  • After a first stable baseline
  • Dataset is larger (> 50k examples)
  • Youโ€™re preparing for production use
  • You have known critical failure modes (e.g. medical, legal, fintech)
flowchart TD
    D[Stable Model] --> E[Introduce Edge Cases]
    E --> F[Expand to Ambiguous/Complex Inputs]
    F --> G[Measure Improvement & Robustness]
Loading

๐Ÿ“ Strategy: Layered Curriculum

  1. Phase 1: High-consistency examples only
  2. Phase 2: Add minor variation (synonyms, rewordings)
  3. Phase 3: Inject edge cases and ambiguity progressively
graph LR
    A1[Phase 1: Consistency] --> A2[Phase 2: Variation]
    A2 --> A3[Phase 3: Edge Robustness]
Loading

๐Ÿ“‹ Edge Case Inclusion Checklist

Criteria Score (0โ€“3) Notes
Is it common in production?
Would failure be critical? (e.g. medical, safety, legal)
Does it teach generalization?
Is the model currently failing? Based on logs/tests
Can it be grouped & scaled? Part of a broader pattern?

โžก๏ธ Include if total score โ‰ฅ 8


โœ‚๏ธ What to Hold Back

  • Ultra rare phrasing with no production impact
  • Contrived or noisy edge cases with little structure
  • Very ambiguous prompts without clear resolution

These can be added later for robustness testing or eval splits.


๐Ÿงฎ Sample Scoring Script

import json

def score_example(example):
    score = 0
    notes = []

    if example.get("common_in_production"):
        score += 2
        notes.append("Seen in prod")
    if example.get("failure_is_critical"):
        score += 3
        notes.append("Critical failure")
    if example.get("teaches_generalization"):
        score += 2
        notes.append("Good generalization")
    if example.get("model_fails_here"):
        score += 2
        notes.append("Current weak spot")
    if example.get("is_scalable_pattern"):
        score += 1
        notes.append("Scalable pattern")

    return score, notes

with open("dataset.jsonl") as f:
    for line in f:
        item = json.loads(line)
        score, notes = score_example(item)
        if score >= 8:
            print(f"INCLUDE ({score}): {item['prompt']} // Notes: {', '.join(notes)}")
        else:
            print(f"HOLD ({score}): {item['prompt']} // Notes: {', '.join(notes)}")

Summary

First, teach the model what "good" looks like. Then, stretch it toward robustness.

Consistency โ†’ Variation โ†’ Edge Robustness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment