vb Vaibhavs10

Recently, I learned that some of the top reward models on RewardBench were trained on a preference dataset that has unintentional contamination with the benchmark. The dataset, Skyworks Preferences 80k contains contamination by mixing a Magpie dataset in. Magpie is a new method for having language models generate instructions by prompting them with an empty chat template. The source for the Skyworks dataset that was contaminated is Argilla/magpie-ultra-v0.1, generated with Llama 3.1 405B Instruct. I would never expect a Magpie dataset to be contaminated.

What seems likely is that Meta trained on some these prompts, but the exact provenance of each prompt needs more example. For example, we learned that some of the prompts we used in our LLMBar subsets they got from popular training sets like Al

	"""
	A minimal, fast example generating text with Llama 3.1 in MLX.

	To run, install the requirements:

	pip install -U mlx transformers fire

	Then generate text with:

	python l3min.py "How tall is K2?"

	# Author: Daniel de Kok
	# Usage: python shard.py --safetensors-path /fsx/danieldk/4bit-gptq-instruct/gptq_model-4bit-128g.safetensors --framework torch --output-path /fsx/danieldk/4bit-gptq-instruct/gptq-sharded

	import argparse

	import safetensors
	import huggingface_hub


	def get_args():

	# Documentation: https://docs.brew.sh/Formula-Cookbook
	# https://rubydoc.brew.sh/Formula
	# PLEASE REMOVE ALL GENERATED COMMENTS BEFORE SUBMITTING YOUR PULL REQUEST!
	class LlamaCpp < Formula
	desc "LLM inference in C/C++"
	homepage "https://github.com/ggerganov/llama.cpp"
	# pull from git tag to get submodules
	url "https://github.com/ggerganov/llama.cpp.git",
	tag: "b2568",
	revision: "be55134a535f7218c53f39211755b1c7550851b2"

	""" To use: install LLM studio (or Ollama), clone OpenVoice, run this script in the OpenVoice directory
	git clone https://github.com/myshell-ai/OpenVoice
	cd OpenVoice
	git clone https://huggingface.co/myshell-ai/OpenVoice
	cp -r OpenVoice/* .
	pip install whisper pynput pyaudio
	"""

	from openai import OpenAI
	import time