Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization

nvidia-smi said this required 11181MiB, at least to train on the sequence lengths of prompt that occurred initially in the alpaca dataset (~337 token long prompts).
You can get this down to about 10.9GB if (by modifying qlora.py) you run torch.cuda.empty_cache() after PEFT has been applied to your loaded model and before you begin training.

Setup

All instructions are written assuming your command-line shell is bash.

Clone repository:

git clone https://github.com/artidoro/qlora
cd qlora

Create + activate a new virtual environment

This is to avoid interfering with your current Python environment (other Python scripts on your computer might not appreciate it if you update a bunch of packages they were relying on).

Follow the instructions for virtualenv, or conda, or neither (if you don't care what happens to other Python scripts on your computer).

Using `venv`

Create environment:

python -m venv venv
pip install --upgrade pip

Activate environment:

. ./venv/bin/activate

(First-time) update environment's pip:

pip install --upgrade pip

Using `conda`

Download conda.

Skip this step if you already have conda.

Install conda:

Skip this step if you already have conda.

Assuming you're using a bash shell:

# Linux installs Anaconda via this shell script.
bash Anaconda-latest-Linux-x86_64.sh
eval "$(~/anaconda3/bin/conda shell.bash hook)"
conda config --set auto_activate_base false
conda init

Create environment:

conda create -n p311 python=3.11

Activate environment:

conda activate p311

Install package dependencies

Ensure you have activated the environment you created above.

(Optional) treat yourself to latest nightly of PyTorch, with support for Python 3.11 and CUDA 12.1:

pip install --upgrade --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cu121

Install very recent commits of transformers, peft and accelerate, and the bitsandbytes test release:

pip install -U --index-url https://test.pypi.org/simple/ 'bitsandbytes>=0.39.0'
pip install -U git+https://github.com/huggingface/transformers.git@dc67da0
pip install -U git+https://github.com/huggingface/peft.git@42a184f
pip install -U git+https://github.com/huggingface/accelerate.git@c9fbb71

Install the qlora repository's dependencies:

pip install -r requirements.txt

Run:

Ensure first that you have activated your virtual environment.

If you don't wish to use an unofficial distribution such as huggyllama/llama-7b: you can convert your own LLaMA weights to Huggingface + safetensors format like so, and provide that local directory as your argument to model_name_or_path.

From root of qlora repository:

# modify max_memory_MB to whatever your GPU can fit. even as low as 12000 worked for me.
python -m qlora --model_name_or_path huggyllama/llama-7b --bf16 --dataset alpaca --max_memory_MB 24000

This should start QLoRA training LLaMA-7B on the alpaca dataset.

Birch-san/fine-tuning.md

Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization

Setup

Create + activate a new virtual environment

Using `venv`

Using `conda`

Install package dependencies

Run:

LegallyCoder commented Jun 7, 2023

Uh oh!

Birch-san/fine-tuning.md

Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization

Setup

Create + activate a new virtual environment

Using venv

Using conda

Install package dependencies

Run:

LegallyCoder commented Jun 7, 2023

Uh oh!

Using `venv`

Using `conda`