Skip to content

Instantly share code, notes, and snippets.

@CandyMi
Last active September 26, 2024 08:58
Show Gist options
  • Save CandyMi/1daf3b41b63458dcb85e09869d0c0e5e to your computer and use it in GitHub Desktop.
Save CandyMi/1daf3b41b63458dcb85e09869d0c0e5e to your computer and use it in GitHub Desktop.
GPT4All Performance Benchmarks

GPT4All Performance Benchmarks

Model BoolQ PIQA HellaSwag WinoGrande ARC-e ARC-c OBQA Avg
GPT4All-J 6B v1.0 73.4 74.8 63.4 64.7 54.9 36 40.2 58.2
GPT4All-J v1.1-breezy 74 75.1 63.2 63.6 55.4 34.9 38.4 57.8
GPT4All-J v1.2-jazzy 74.8 74.9 63.6 63.8 56.6 35.3 41 58.6
GPT4All-J v1.3-groovy 73.6 74.3 63.8 63.5 57.7 35 38.8 58.1
GPT4All-J Lora 6B 68.6 75.8 66.2 63.5 56.4 35.7 40.2 58.1
GPT4All LLaMa Lora 7B 73.1 77.6 72.1 67.8 51.1 40.4 40.2 60.3
GPT4All 13B snoozy 83.3 79.2 75 71.3 60.9 44.2 43.4 65.3
GPT4All Falcon 77.6 79.8 74.9 70.1 67.9 43.4 42.6 65.2
Nous-Hermes 79.5 78.9 80 71.9 74.2 49.2 46.4 68.6
Dolly 6B 68.8 77.3 67.6 63.9 62.9 38.7 41.2 60.1
Dolly 12B 56.7 75.4 71 62.2 64.6 38.5 40.4 58.4
Alpaca 7B 73.9 77.2 73.9 66.1 59.8 43.3 43.4 62.5
Alpaca Lora 7B 74.3 79.3 74 68.8 56.6 43.9 42.6 62.8
GPT-J 6.7B 65.4 76.2 66.2 64.1 62.2 36.6 38.2 58.4
LLama 7B 73.1 77.4 73 66.9 52.5 41.4 42.4 61
LLama 13B 68.5 79.1 76.2 70.1 60 44.6 42.2 63
Pythia 6.7B 63.5 76.3 64 61.1 61.3 35.2 37.2 56.9
Pythia 12B 67.7 76.6 67.3 63.8 63.9 34.8 38 58.9
Fastchat T5 81.5 64.6 46.3 61.8 49.3 33.3 39.4 53.7
Fastchat Vicuña 7B 76.6 77.2 70.7 67.3 53.5 41.2 40.8 61
Fastchat Vicuña 13B 81.5 76.8 73.3 66.7 57.4 42.7 43.6 63.1
StableVicuña RLHF 82.3 78.6 74.1 70.9 61 43.5 44.4 65
StableLM Tuned 62.5 71.2 53.6 54.8 52.4 31.1 33.4 51.3
StableLM Base 60.1 67.4 41.2 50.1 44.9 27 32 46.1
Koala 13B 76.5 77.9 72.6 68.8 54.3 41 42.8 62
Open Assistant Pythia 12B 67.9 78 68.1 65 64.2 40.4 43.2 61
Mosaic MPT7B 74.8 79.3 76.3 68.6 70 42.2 42.6 64.8
Mosaic mpt-instruct 74.3 80.4 77.2 67.8 72.2 44.6 43 65.6
Mosaic mpt-chat 77.1 78.2 74.5 67.5 69.4 43.3 44.2 64.9
Wizard 7B 78.4 77.2 69.9 66.5 56.8 40.5 42.6 61.7
Wizard 7B Uncensored 77.7 74.2 68 65.2 53.5 38.7 41.6 59.8
Wizard 13B Uncensored 78.4 75.5 72.1 69.5 57.5 40.4 44 62.5
GPT4-x-Vicuna-13b 81.3 75 75.2 65 58.7 43.9 43.6 63.2
Falcon 7b 73.6 80.7 76.3 67.3 71 43.3 44.4 65.2
Falcon 7b instruct 70.9 78.6 69.8 66.7 67.9 42.7 41.2 62.5
text-davinci-003 88.1 83.8 83.4 75.8 83.9 63.9 51 75.7
Model BoolQ PIQA HellaSwag WinoGrande ARC-e ARC-c OBQA Avg
GPT4All-J 6B v1.0 73.4 74.8 63.4 64.7 54.9 36 40.2 58.2
GPT4All-J v1.1-breezy 74 75.1 63.2 63.6 55.4 34.9 38.4 57.8
GPT4All-J v1.2-jazzy 74.8 74.9 63.6 63.8 56.6 35.3 41 58.6
GPT4All-J v1.3-groovy 73.6 74.3 63.8 63.5 57.7 35 38.8 58.1
GPT4All-J Lora 6B 68.6 75.8 66.2 63.5 56.4 35.7 40.2 58.1
GPT4All LLaMa Lora 7B 73.1 77.6 72.1 67.8 51.1 40.4 40.2 60.3
GPT4All 13B snoozy 83.3 79.2 75 71.3 60.9 44.2 43.4 65.3
GPT4All Falcon 77.6 79.8 74.9 70.1 67.9 43.4 42.6 65.2
Nous-Hermes 79.5 78.9 80 71.9 74.2 49.2 46.4 68.6
Dolly 6B 68.8 77.3 67.6 63.9 62.9 38.7 41.2 60.1
Dolly 12B 56.7 75.4 71 62.2 64.6 38.5 40.4 58.4
Alpaca 7B 73.9 77.2 73.9 66.1 59.8 43.3 43.4 62.5
Alpaca Lora 7B 74.3 79.3 74 68.8 56.6 43.9 42.6 62.8
GPT-J 6.7B 65.4 76.2 66.2 64.1 62.2 36.6 38.2 58.4
LLama 7B 73.1 77.4 73 66.9 52.5 41.4 42.4 61
LLama 13B 68.5 79.1 76.2 70.1 60 44.6 42.2 63
Pythia 6.7B 63.5 76.3 64 61.1 61.3 35.2 37.2 56.9
Pythia 12B 67.7 76.6 67.3 63.8 63.9 34.8 38 58.9
Fastchat T5 81.5 64.6 46.3 61.8 49.3 33.3 39.4 53.7
Fastchat Vicuña 7B 76.6 77.2 70.7 67.3 53.5 41.2 40.8 61
Fastchat Vicuña 13B 81.5 76.8 73.3 66.7 57.4 42.7 43.6 63.1
StableVicuña RLHF 82.3 78.6 74.1 70.9 61 43.5 44.4 65
StableLM Tuned 62.5 71.2 53.6 54.8 52.4 31.1 33.4 51.3
StableLM Base 60.1 67.4 41.2 50.1 44.9 27 32 46.1
Koala 13B 76.5 77.9 72.6 68.8 54.3 41 42.8 62
Open Assistant Pythia 12B 67.9 78 68.1 65 64.2 40.4 43.2 61
Mosaic MPT7B 74.8 79.3 76.3 68.6 70 42.2 42.6 64.8
Mosaic mpt-instruct 74.3 80.4 77.2 67.8 72.2 44.6 43 65.6
Mosaic mpt-chat 77.1 78.2 74.5 67.5 69.4 43.3 44.2 64.9
Wizard 7B 78.4 77.2 69.9 66.5 56.8 40.5 42.6 61.7
Wizard 7B Uncensored 77.7 74.2 68 65.2 53.5 38.7 41.6 59.8
Wizard 13B Uncensored 78.4 75.5 72.1 69.5 57.5 40.4 44 62.5
GPT4-x-Vicuna-13b 81.3 75 75.2 65 58.7 43.9 43.6 63.2
Falcon 7b 73.6 80.7 76.3 67.3 71 43.3 44.4 65.2
Falcon 7b instruct 70.9 78.6 69.8 66.7 67.9 42.7 41.2 62.5
text-davinci-003 88.1 83.8 83.4 75.8 83.9 63.9 51 75.7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment