| Concept | The Question It Answers | Returns | Example |
|---|---|---|---|
| Scorer | HOW do I compute this specific value? (deterministic) | A number or pass/fail | Exact match, F1, BERTScore, ROUGE |
| Grader | HOW do I assess this against a rubric? (qualitative + explanatory) | Score + explanation | LLM-as-a-Judge with faithfulness rubric, model_graded_qa |
| Evaluation Task | WHAT am I measuring, end-to-end? | Structured result from one or more scorers/graders | Hallucination detection protocol |
| Evaluation Suite / Collection | WHICH tasks apply to my use case? | Aggregate quality signal | RAG faithfulness suite for healthcare Q&A |
Created
April 26, 2026 19:35
-
-
Save williamcaban/61dbf9177e68a8f52e0a899bb6c03da0 to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment