aidando73/gist:f32788f42e12bad180ae91890b05e495

Last active November 21, 2024 10:18

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/aidando73/f32788f42e12bad180ae91890b05e495.js"></script>
Save aidando73/f32788f42e12bad180ae91890b05e495 to your computer and use it in GitHub Desktop.

Download ZIP

How do LLM evals work?

Raw

gistfile1.md

https://www.reddit.com/r/LocalLLaMA/comments/16whnun/how_do_you_account_for_varying_llm_output_with/
https://huggingface.co/blog/open-llm-leaderboard-mmlu
Inspect a few eval datasets, mmlu: https://huggingface.co/datasets/lukaemon/mmlu, hella swag
Look into: https://crfm.stanford.edu/helm/lite/latest/
Look into how https://github.com/EleutherAI/lm-evaluation-harness/tree/main compute which answer the LLM chose. Aparently they gather a set of tokens and then compute probabilities - https://huggingface.co/blog/open-llm-leaderboard-mmlu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment