Last active
September 12, 2024 21:13
-
-
Save av/db14a1f040f46dfb75e48451f4f14847 to your computer and use it in GitHub Desktop.
CheeseBench
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- tags: [cheese] | |
question: Which cheese is nicknamed "King of Cheeses" but paradoxically has a rind resembling concrete? | |
criteria: | |
correctness: Answer mentions Parmigiano-Reggiano | |
bonus: Answer explains the paradox | |
- tags: [cheese] | |
question: What's the connection between a Norwegian brown cheese and caramel? | |
criteria: | |
correctness: Answer mentions caramelized milk sugars in any form | |
bonus: Answer mentions Brunost or its alternative names | |
- tags: [cheese] | |
question: Which cheese's name translates to "re-cooked" and why? | |
criteria: | |
correctness: Answer mentions ricotta in any form | |
bonus: Answer explains how ricotta is made | |
- tags: [cheese] | |
question: What unlikely ingredient gives Sage Derby its distinctive green veins? | |
criteria: | |
correctness: Answer mentions sage leaves (powdered or otherwise) | |
bonus: Answer explains how Sage Derby is made |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
OPENROUTER_KEY=<your key> | |
TASKS=/path/to/cheese.yaml | |
NAME=cheese | |
# Common | |
h bench judge meta-llama/llama-3.1-70b-instruct | |
h bench judge_api https://openrouter.ai/api | |
h bench judge_key $OPENROUTER_KEY | |
h bench tasks $TASKS | |
h config set bench.parallel 4 | |
# Ollama | |
h bench model llama3.1:8b-instruct-q2_K | |
h bench api http://harbor.ollama:11434 | |
h bench variants --temperature 0 --temperature 0.5 --temperature 1.0 --max_tokens 1024 --model llama3.1:8b --model phi3:latest --model mistral:7b --model gemma2:latest --model mixtral:latest --model mistral-nemo:12b-instruct-2407-q8_0 --model dolphin-mixtral:8x7b --model codestral | |
h bench run --name ollama-llama3.1-8b-q2_K-${NAME} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment