Skip to content

Instantly share code, notes, and snippets.

@pvzhelnov
Created September 18, 2024 18:41
Show Gist options
  • Save pvzhelnov/4cb0b06d202e47a5fa93d580bd0bc191 to your computer and use it in GitHub Desktop.
Save pvzhelnov/4cb0b06d202e47a5fa93d580bd0bc191 to your computer and use it in GitHub Desktop.
LMSYS Rating (Arena Math) 2024-09-17 Transcribed data (ChatGPT-4o) Point estimate Lower bound CI Upper bound CI
o1-preview Model: o1-preview Rating = 1,362.384 CI: +22.36819 / -20.70303 1362.384 1341.68097 1384.75219
claude-3-5-sonnet-20240620 Model: claude-3-5-sonnet Rating = 1,273.43 CI: +7.333773 / -7.329918 1273.43 1266.100082 1280.763773
gpt-4o-mini-2024-07-18 Model: gpt-40-mini Rating = 1,224.153 CI: +7.111128 / -8.208487 1224.153 1215.944513 1231.264128
Abs diff Point estimate Lower bound CI Upper bound CI
o1-sonnet 88.954 60.917197 118.652108
sonnet-4o-mini 49.277 34.835954 64.81926
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment