- https://www.reddit.com/r/LocalLLaMA/comments/16whnun/how_do_you_account_for_varying_llm_output_with/
- https://huggingface.co/blog/open-llm-leaderboard-mmlu
- Inspect a few eval datasets, mmlu: https://huggingface.co/datasets/lukaemon/mmlu, hella swag
- Look into: https://crfm.stanford.edu/helm/lite/latest/
- Look into how https://github.com/EleutherAI/lm-evaluation-harness/tree/main compute which answer the LLM chose. Aparently they gather a set of tokens and then compute probabilities - https://huggingface.co/blog/open-llm-leaderboard-mmlu
Last active
November 21, 2024 10:18
-
-
Save aidando73/f32788f42e12bad180ae91890b05e495 to your computer and use it in GitHub Desktop.
How do LLM evals work?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment