Skip to content

Instantly share code, notes, and snippets.

@williamcaban
Created April 26, 2026 19:35
Show Gist options
  • Select an option

  • Save williamcaban/61dbf9177e68a8f52e0a899bb6c03da0 to your computer and use it in GitHub Desktop.

Select an option

Save williamcaban/61dbf9177e68a8f52e0a899bb6c03da0 to your computer and use it in GitHub Desktop.
Concept The Question It Answers Returns Example
Scorer HOW do I compute this specific value? (deterministic) A number or pass/fail Exact match, F1, BERTScore, ROUGE
Grader HOW do I assess this against a rubric? (qualitative + explanatory) Score + explanation LLM-as-a-Judge with faithfulness rubric, model_graded_qa
Evaluation Task WHAT am I measuring, end-to-end? Structured result from one or more scorers/graders Hallucination detection protocol
Evaluation Suite / Collection WHICH tasks apply to my use case? Aggregate quality signal RAG faithfulness suite for healthcare Q&A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment