Skip to content

Instantly share code, notes, and snippets.

@pdtgct
Created April 28, 2023 15:27
Show Gist options
  • Save pdtgct/b8bcbf9220d4d5059b62b1c35615a650 to your computer and use it in GitHub Desktop.
Save pdtgct/b8bcbf9220d4d5059b62b1c35615a650 to your computer and use it in GitHub Desktop.
Convert HF to GGML

The LLaMA model weights may be converted from Huggingface PyTorch format back to GGML in two steps:

  1. download from decapoda-research/llama-7b-hf and save as pytorch .pth
  2. use the ggerganov/llama.cpp script, convert-pth-to-ggml.py to convert from pytorch .pth to GGML

This process will result in ggml model with float16 (fp16) precision.

Prerequisite

You need the LLaMA tokenizer configuration and the model configuration files. There currently isn't a good conversion from Hugging Face to the original pytorch (the tokenizer files are the same but the model checklist.chk and params.json are missing). The best way to do this is to:

pip install -U pyllama transformers
llama.cpp $ pip install -r requirements.txt
  • download the 7B configuration (let the consolidated.00.pth - model weights download - fail):
python -m llama.download --model_size=7B --folder=llama

This will download a directory structure like:

llama/
  config.json
  ggml-vocab.bin
  tokenizer.model
  tokenizer_checklist.chk
  tokenizer_config.json
  7B/
    checklist.chk
    params.json

Your remaining task is to convert the Hugging Face pytorch pickle file to a pytorch state dict and convert that to GGML.

Conversion

  1. load the Huggingface model and save the state dict as pytorch .pth (in EMP ensure you have the SSO Proxy on):
from transformers import AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained("decapoda-research/llama-7b-hf")
torch.save(model.state_dict(), "llama/7B/consolidated.00.pth")
  1. consolidate the 7B files into a single directory, so you have:
llama_7b/
  config.json
  ggml-vocab.bin
  tokenizer.model
  tokenizer_checklist.chk
  tokenizer_config.json
  checklist.chk
  consolidated.00.pth
  params.json  
  1. convert the consolidated.00.pth file to ggml-model-fp16.bin using the convert-transformers-to-ggml.py script from llama.cpp
python convert-transformers-to-ggml.py llama_7B 1

When you are done, you will have file you can use with llama.cpp, but you have to put it back into the llama/7B/ directory.

llama/
  config.json
  ggml-vocab.bin
  tokenizer.model
  tokenizer_checklist.chk
  tokenizer_config.json
  7B/
    checklist.chk
    params.json
    ggml-model-fp16.bin  # <-- added here

Now you can use this with llama.cpp (after building llama.cpp):

./main -m ./models/7B/ggml-model-fp16.bin -n 128
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment