Installing facebook's llama model locvally

Llama-2 setup instructions

The following instructions were used to get Facebook's Llama-2 up and running on Ubuntu 22.04 (70B model) and M1 Macbook Air (7B model).

Divided into 2 parts:

Part 1: Download models from facebook's repo: https://github.com/facebookresearch/llama
Part 2: Use llama.cpp repo to convert the model to make inference: https://github.com/ggerganov/llama.cpp

Important if you are tyring to work with the 70B model and have 500 GB or less free space

This process requires a lot of free space if you are downloading the 70B model. Even with 500 GB space, I was running out of space in the middle because of a lot of intermediate files being generated.

Use df -h to keep checking free space or run watch df -h in a separate terminal to keep watching the space every 2 seconds.

After downloading the model (Part 1) and converting it to ggml-model (Part 2 step 4), I moved my downloaded model from part 1 (consolidated.xx.pth files) to another hard drive to free some space before running the quantize command (part 2 step 5).

Part 1:

Install python 3.9 or above. Most of the recent linux distro comes with it.
Setup virtualenv (optional but recommened)
Goto https://ai.meta.com/resources/models-and-libraries/llama-downloads/ and request for access.

It should take around 5-10 minutes for you to receive an email from Meta AI. Meanwhile, you can complete step 4 to 7.

Install git.

# for ubuntu
(venv) $ sudo apt update
(venv) $ sudo apt install git

Also install wget and md5sum if you don't have it already.

Clone facebook's repo

(venv) $ git clone https://github.com/facebookresearch/llama.git

Once the clone is complete, go inside the llama directory and install requirements. This will take around 10 minutes.
```
(venv) $ cd llama
(venv) $ pip install -e .
```

Make download.sh executable and run it:

(venv) $ chmod +x download.sh
(venv) $ ./download.sh

After running the script, you will be prompted to enter the link that you received in step 3. Link starts with https://download.llamameta.net/*?Policy=eyJTdG.... Copy that properly, paste and hit enter.
Then you will be asked to choose a model (something like below):
```
Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all:
```
- 70B is the largest with size around 129 GB. It took me around 14 hrs to download it completely.
- So make sure you have sufficient space, good internet speed and extra time to work on it.
- If you are just starting out, use the 7B model to test things out.
- Alternatively, look into huggingface transformers which, I think may require a paid account.

Part 2:

In a new directory outside the llama dir from above, clone this repo:

(venv) $ cd ..
(venv) $ git clone https://github.com/ggerganov/llama.cpp.git

Go inside the directory and run make command:
```
(venv) $ cd llama.cpp
(venv) $ make
```

Install requirements:

(venv) $ pip install -r requirements.txt

Running convert.py

(venv) $ python convert.py <path to downloaded model folder>

for e.g. in my case it is like this:

(venv) $ python convert.py ../llama/llama-2-70b-chat/

I run this script from inside llama.cpp directory and my directory structure looks like this:

parent_folder/
    llama/                        ---- cloned facebook's llama repo
        llama-2-70b-chat/         ---- downloaded model from facebook
        other files in that repo
    llama.cpp/                    ---- cloned ggerganov/llama.cpp repo
        convert.py                ---- script to run
        other files in that repo

Read through convert.py script's main function on around line 1282 to know more about other parameters.

Quantize the model

(venv) $ ./quantize ../llama/llama-2-70b/ggml-model-f16.gguf ../llama/llama-2-70b/ggml-model-q4_0.gguf q4_0

Run the inference

# for 70B model we need to add -gqa 8 
(venv) $ ./main -m ../llama/llama-2-70b-chat/ggml-model-q4_0.bin -n 128 -gqa 8

# for 7B model
(venv) $ ./main -m ../llama/llama-2-70b-chat/ggml-model-q4_0.bin -n 128

sagunsh/llama-setup.md

Llama-2 setup instructions

Part 1:

Part 2: