Skip to content

Instantly share code, notes, and snippets.

View baberabb's full-sized avatar

Baber Abbasi baberabb

View GitHub Profile
dataset_path: Idavidrein/gpqa
tag: gpqa
output_type: generate_until
process_docs: !function utils.process_docs
test_split: train
fewshot_split: null
description: "Here are some example questions from experts. An explanation is given before the final answer. Answer the final question yourself, giving your reasoning beforehand.\n"
doc_to_text: "Question: {{Question|trim}}\nChoices:\n(A) {{choice1}}\n(B) {{choice2}}\n(C) {{choice3}}\n(D) {{choice4}}\nGive step by step reasoning before you answer, and when you're ready to answer, please use the format \"The correct answer is (insert answer here)\":"
doc_to_target: answer
fewshot_delimiter: ""

Mathvista 100 samples

llama: 51.5 hf-multimodal (pretrained=meta-llama/Llama-3.2-11B-Vision-Instruct), gen_kwargs: (None), limit: 50.0, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
mathvista 1 extract_answer 0 acc 0.46 ± 0.0712

hf-multimodal (pretrained=llava-hf/llava-onevision-qwen2-7b-ov-chat-hf), gen_kwargs: (None), limit: 50.0, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
@baberabb
baberabb / spo_loss.py
Created June 10, 2024 15:38 — forked from crowsonkb/spo_loss.py
Scalar Preference Optimization
"""Scalar Preference Optimization."""
import torch
from torch.nn import functional as F
def logp_completion(logits, tokens, mask):
"""Compute the log probabilities of completions given their prompts.
Args:
import asyncio
import os
import random
from collections import deque
import ssl
from urllib.parse import urlparse
import aiohttp
import polars as pl
import aiofiles
@baberabb
baberabb / grouped_results.md
Last active February 12, 2024 23:23
mistral MMLU

hf (pretrained=mistralai/Mistral-7B-v0.1), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
mmlu N/A none 0 acc 0.6273 ± 0.0294
- humanities N/A none None acc 0.6520 ± 0.0233
- formal_logic 0 none None acc 0.3571 ± 0.0429
- high_school_european_history 0 none None acc 0.7455 ± 0.0340
- high_school_us_history 0 none None acc 0.7549 ± 0.0302
- high_school_world_history 0 none None acc 0.7764 ± 0.0271
- international_law 0 none None acc 0.7521 ± 0.0394