Skip to content

Instantly share code, notes, and snippets.

@natolambert
natolambert / skyworks-rewardbench-contamination.md
Last active February 18, 2025 03:56
MagPie RewardBench Contamination (found through SkyWorks Preferences)

Recently, I learned that some of the top reward models on RewardBench were trained on a preference dataset that has unintentional contamination with the benchmark. The dataset, Skyworks Preferences 80k contains contamination by mixing a Magpie dataset in. Magpie is a new method for having language models generate instructions by prompting them with an empty chat template. The source for the Skyworks dataset that was contaminated is Argilla/magpie-ultra-v0.1, generated with Llama 3.1 405B Instruct. I would never expect a Magpie dataset to be contaminated.

What seems likely is that Meta trained on some these prompts, but the exact provenance of each prompt needs more example. For example, we learned that some of the prompts we used in our LLMBar subsets they got from popular training sets like Al

@awni
awni / l3min.py
Last active January 25, 2025 21:30
A minimal, fast implementation of Llama 3.1 in MLX.
"""
A minimal, fast example generating text with Llama 3.1 in MLX.
To run, install the requirements:
pip install -U mlx transformers fire
Then generate text with:
python l3min.py "How tall is K2?"
# Author: Daniel de Kok
# Usage: python shard.py --safetensors-path /fsx/danieldk/4bit-gptq-instruct/gptq_model-4bit-128g.safetensors --framework torch --output-path /fsx/danieldk/4bit-gptq-instruct/gptq-sharded
import argparse
import safetensors
import huggingface_hub
def get_args():
@Moisan
Moisan / llama_cpp.rb
Created March 28, 2024 19:07
llama.cpp formula
# Documentation: https://docs.brew.sh/Formula-Cookbook
# https://rubydoc.brew.sh/Formula
# PLEASE REMOVE ALL GENERATED COMMENTS BEFORE SUBMITTING YOUR PULL REQUEST!
class LlamaCpp < Formula
desc "LLM inference in C/C++"
homepage "https://github.com/ggerganov/llama.cpp"
# pull from git tag to get submodules
url "https://github.com/ggerganov/llama.cpp.git",
tag: "b2568",
revision: "be55134a535f7218c53f39211755b1c7550851b2"
@thomwolf
thomwolf / fast_speech_text_speech.py
Last active January 14, 2025 12:13
speech to text to speech
""" To use: install LLM studio (or Ollama), clone OpenVoice, run this script in the OpenVoice directory
git clone https://github.com/myshell-ai/OpenVoice
cd OpenVoice
git clone https://huggingface.co/myshell-ai/OpenVoice
cp -r OpenVoice/* .
pip install whisper pynput pyaudio
"""
from openai import OpenAI
import time