The LLaMA model weights may be converted from Huggingface PyTorch format back to GGML in two steps:
- download from decapoda-research/llama-7b-hf
and save as pytorch
.pth - use the ggerganov/llama.cpp script,
convert-pth-to-ggml.pyto convert from pytorch.pthto GGML
This process will result in ggml model with float16 (fp16) precision.