Skip to content

Instantly share code, notes, and snippets.

@prykon
Created September 13, 2024 20:33
Show Gist options
  • Save prykon/cfffe6ea66e9bfe0dbfc0ec320c11125 to your computer and use it in GitHub Desktop.
Save prykon/cfffe6ea66e9bfe0dbfc0ec320c11125 to your computer and use it in GitHub Desktop.
LLM Evaluation Analysis

LLM Evaluation Analysis

Model Name Score Accuracy Score Helpfulness Score Specificity Score Clarity
anthropic/claude-3.5-sonnet 1.419014 1.440141 0.957746 0.992958
google/gemma-2-9b-it 1.197183 1.232394 0.802817 0.985915
meta-llama/llama-3.1-8b-instruct 1.116197 1.140845 0.757042 0.961268
mistralai/mistral-nemo 1.183099 1.214789 0.806338 0.950704
openai/gpt-4o-mini 1.281690 1.338028 0.859155 0.985915

image

Model Name Score Final
anthropic/claude-3.5-sonnet 4.809859
openai/gpt-4o-mini 4.464789
google/gemma-2-9b-it 4.218310
mistralai/mistral-nemo 4.154930
meta-llama/llama-3.1-8b-instruct 3.975352
Model Name Failed to Score Sum Total Questions Margin of Error
anthropic/claude-3.5-sonnet 0 200 0.0
google/gemma-2-9b-it 0 200 0.0
meta-llama/llama-3.1-8b-instruct 0 200 0.0
mistralai/mistral-nemo 0 200 0.0
openai/gpt-4o-mini 0 200 0.0

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment