Skip to content

Instantly share code, notes, and snippets.

View av's full-sized avatar
💻
🌚

Ivan Charapanau av

💻
🌚
View GitHub Profile
@av
av / padbench.sh
Created October 6, 2024 16:30
padbench
#!/bin/bash
# TASK=padbench
# TASK=bbh_256_slim
TASK=mmlu_256_slim
# Common
# h bench tasks ./scripts/bench/padbench.yaml
h bench tasks ./scripts/bench/$TASK.yaml
h config set bench.parallel 4
@av
av / summary.html
Created September 25, 2024 19:51
Small Llama 3.2 Benchmarks
This file has been truncated, but you can view the full file.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Harbor Bench</title>
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
<style>
body {
@av
av / bbh_256.yml
Created September 22, 2024 10:56
Example Harbor Bench tasks file - 256 tasks from Big Bench Hard
- tags:
- bbh
question: >-
Complete the rest of the sequence, making sure that the parentheses are
closed properly. Input: { < { { [ ] } } { < [ { { < > } } [ ( ) ( ) ] [ [ [
[ ( { < ( < ( [ ] ) > ) > } ) ] ] ] ] ] ( ) ( [ ] { } ) > } > [ { ( ( ) ) }
]
criteria:
correctness: 'The answer is }'
- tags:
@av
av / tasks.html
Created September 15, 2024 15:51
misguidedbench - tasks
This file has been truncated, but you can view the full file.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Task Report</title>
<style>
body {
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
@av
av / misguidedbench.sh
Last active September 15, 2024 15:41
misguidedbench
#!/bin/bash
OPENROUTER_KEY=< your key here >
TASKS=/path/to/misguided.yaml
NAME=misguided
# Common
h bench judge meta-llama/llama-3.1-70b-instruct
h bench judge_api https://openrouter.ai/api
h bench judge_key $OPENROUTER_KEY
h bench tasks $TASKS
@av
av / cheese.yaml
Last active September 12, 2024 21:13
CheeseBench
- tags: [cheese]
question: Which cheese is nicknamed "King of Cheeses" but paradoxically has a rind resembling concrete?
criteria:
correctness: Answer mentions Parmigiano-Reggiano
bonus: Answer explains the paradox
- tags: [cheese]
question: What's the connection between a Norwegian brown cheese and caramel?
criteria:
correctness: Answer mentions caramelized milk sugars in any form
@av
av / engbench.sh
Created September 12, 2024 16:22
Harbor bench - engines recipe
#!/bin/bash
# Note that you're not expected to run this
# file as is in one go
OPENROUTER_KEY=<your_openrouter_key>
TASKS=<path_to_tasks_file>
NAME=engbench
@av
av / mmlu_256.yaml
Created September 12, 2024 16:17
Harbor MMLU 256
- tags:
- ori_mmlu-global_facts
question: >-
<instructions>Carefully read the question and the options provided. Choose
the option that best answers the question.</instructions>
<question>As of 2017, the share of deaths in Greenland by suicide is
about</question>
<options><option>A: 3.60%</option>
@av
av / rml.md
Created September 6, 2024 21:43
RML - Reasoning Markup Language

Prompt

You are a helpful assistant. You're smart, clever, direct and pragmatic. You notice details that a few people would. Be careful as the questions might attempt to misguide and tricky you. When answering to the User, you outline your thought process using these tags:

<thought> The root element that encapsulates an entire thought process.
<observation> Initial information or context that prompts the thinking process.
<question> The main query or problem to be addressed.
<hypothesis> An initial proposed explanation or solution.
<reasoning> Container for the logical steps of the thought process.
@av
av / chat.md
Created September 6, 2024 20:57
Misguided Reflection

Problem 1 - Jugs

I have a 1- and a 2-liter jug. I want to measure exactly 3 liters.

Reflection 70B (Free)

<thinking>
Let's approach this problem step by step: