Skip to content

Instantly share code, notes, and snippets.

View mlabonne's full-sized avatar

Maxime Labonne mlabonne

View GitHub Profile
@mlabonne
mlabonne / finetune_llama2.py
Last active January 22, 2025 15:02
Easy Llama 2 fine-tuning script (📝 Article: https://tinyurl.com/finetunellama2)
# Based on younesbelkada/finetune_llama_v2.py
# Install the following libraries:
# pip install accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7 scipy
from dataclasses import dataclass, field
from typing import Optional
import torch
from datasets import load_dataset
from transformers import (
@mlabonne
mlabonne / merge_peft.py
Last active May 29, 2025 13:58
Merge base model and peft adapter and push it to HF hub
# Example usage:
# python merge_peft.py --base_model=meta-llama/Llama-2-7b-hf --peft_model=./qlora-out --hub_id=alpaca-qlora
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
import argparse
def get_args():
@mlabonne
mlabonne / EvolCodeLlama-7b.yaml
Last active March 10, 2024 04:53
Axolotl config file to train mlabonne/EvolCodeLlama-7b (https://huggingface.co/mlabonne/EvolCodeLlama-7b)
base_model: codellama/CodeLlama-7b-hf
base_model_config: codellama/CodeLlama-7b-hf
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true
hub_model_id: EvolCodeLlama-7b
load_in_8bit: false
load_in_4bit: true
strict: false
@mlabonne
mlabonne / YALL - Yet Another LLM Leaderboard.md
Last active November 9, 2025 19:21
Leaderboard made with 🧐 LLM AutoEval (https://github.com/mlabonne/llm-autoeval) using Nous benchmark suite.
Model AGIEval GPT4All TruthfulQA Bigbench Average
zephyr-7b-alpha 38 72.24 56.06 40.57 51.72

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 20.47 ± 2.54
acc_norm 19.69 ± 2.50
agieval_logiqa_en 0 acc 31.49 ± 1.82
Model AGIEval GPT4All TruthfulQA Bigbench Average
dolphin-2.2.1-mistral-7b 38.64 72.24 54.09 39.22 51.05

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 23.23 ± 2.65
acc_norm 21.26 ± 2.57
agieval_logiqa_en 0 acc 35.48 ± 1.88
Model AGIEval GPT4All TruthfulQA Bigbench Average
Mistral-7B-Instruct-v0.2 38.5 71.64 66.82 42.29 54.81

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 23.62 ± 2.67
acc_norm 22.05 ± 2.61
agieval_logiqa_en 0 acc 36.10 ± 1.88
Model AGIEval GPT4All TruthfulQA Bigbench Average
MistralTrix-v1 44.98 76.62 71.44 47.17 60.05

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 25.59 ± 2.74
acc_norm 24.80 ± 2.72
agieval_logiqa_en 0 acc 37.48 ± 1.90
Model AGIEval GPT4All TruthfulQA Bigbench Average
zephyr-7b-beta 37.33 71.83 55.1 39.7 50.99

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 21.26 ± 2.57
acc_norm 20.47 ± 2.54
agieval_logiqa_en 0 acc 33.33 ± 1.85
Model AGIEval GPT4All TruthfulQA Bigbench Average
openchat_3.5 42.67 72.92 47.27 42.51 51.34

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 24.02 ± 2.69
acc_norm 24.80 ± 2.72
agieval_logiqa_en 0 acc 38.86 ± 1.91