You are a helpful assistant. You're smart, clever, direct and pragmatic. You notice details that a few people would. Be careful as the questions might attempt to misguide and tricky you. When answering to the User, you outline your thought process using these tags:
<thought> The root element that encapsulates an entire thought process.
<observation> Initial information or context that prompts the thinking process.
<question> The main query or problem to be addressed.
<hypothesis> An initial proposed explanation or solution.
<reasoning> Container for the logical steps of the thought process.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# TASK=padbench | |
# TASK=bbh_256_slim | |
TASK=mmlu_256_slim | |
# Common | |
# h bench tasks ./scripts/bench/padbench.yaml | |
h bench tasks ./scripts/bench/$TASK.yaml | |
h config set bench.parallel 4 |
This file has been truncated, but you can view the full file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!DOCTYPE html> | |
<html lang="en"> | |
<head> | |
<meta charset="UTF-8"> | |
<meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
<title>Harbor Bench</title> | |
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script> | |
<style> | |
body { |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- tags: | |
- bbh | |
question: >- | |
Complete the rest of the sequence, making sure that the parentheses are | |
closed properly. Input: { < { { [ ] } } { < [ { { < > } } [ ( ) ( ) ] [ [ [ | |
[ ( { < ( < ( [ ] ) > ) > } ) ] ] ] ] ] ( ) ( [ ] { } ) > } > [ { ( ( ) ) } | |
] | |
criteria: | |
correctness: 'The answer is }' | |
- tags: |
This file has been truncated, but you can view the full file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!DOCTYPE html> | |
<html lang="en"> | |
<head> | |
<meta charset="UTF-8"> | |
<meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
<title>Task Report</title> | |
<style> | |
body { | |
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
OPENROUTER_KEY=< your key here > | |
TASKS=/path/to/misguided.yaml | |
NAME=misguided | |
# Common | |
h bench judge meta-llama/llama-3.1-70b-instruct | |
h bench judge_api https://openrouter.ai/api | |
h bench judge_key $OPENROUTER_KEY | |
h bench tasks $TASKS |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- tags: [cheese] | |
question: Which cheese is nicknamed "King of Cheeses" but paradoxically has a rind resembling concrete? | |
criteria: | |
correctness: Answer mentions Parmigiano-Reggiano | |
bonus: Answer explains the paradox | |
- tags: [cheese] | |
question: What's the connection between a Norwegian brown cheese and caramel? | |
criteria: | |
correctness: Answer mentions caramelized milk sugars in any form |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# Note that you're not expected to run this | |
# file as is in one go | |
OPENROUTER_KEY=<your_openrouter_key> | |
TASKS=<path_to_tasks_file> | |
NAME=engbench | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- tags: | |
- ori_mmlu-global_facts | |
question: >- | |
<instructions>Carefully read the question and the options provided. Choose | |
the option that best answers the question.</instructions> | |
<question>As of 2017, the share of deaths in Greenland by suicide is | |
about</question> | |
<options><option>A: 3.60%</option> |
NewerOlder