Skip to content

Instantly share code, notes, and snippets.

@rajivmehtaflex
Last active December 18, 2024 06:54
Show Gist options
  • Save rajivmehtaflex/677d689fa8bb3d56c5afe322b0ed67ef to your computer and use it in GitHub Desktop.
Save rajivmehtaflex/677d689fa8bb3d56c5afe322b0ed67ef to your computer and use it in GitHub Desktop.
huggingface to ollama conversion with quantization
mkdir ollama-quntization
cd ollama-quntization/
bash <(curl -sSL https://g.bodaay.io/hfd) -h
./hfdownloader -m TinyLlama/TinyLlama-1.1B-Chat-v1.0
sudo apt install tree
tree
touch Modelfile
ollama create -f Modelfile tinyllama
ollama cp tinyllama rajivmehtapy/tinyllama
ollama list
ollama push rajivmehtapy/tinyllama:latest
ollama create -f Modelfile tinyllama:q4_K_M -q q4_K_M
ollama cp tinyllama:q4_K_M rajivmehtapy/tinyllama:q4_K_M
ollama push rajivmehtapy/tinyllama:q4_K_M
Notes:
-> you need to configure ollama via add machine public key. you can found public key @ ~/.ollama/id_ed25519.pub
-> -quantize string will be found from any gguf repo from huggingface.

Playing with Qwen2-VL through llama.cpp on Mac M3

Preparation Steps

1. Clone and Build llama.cpp

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_METAL=OFF  # Added option to avoid errors
cmake --build build --config Release

2. Download Checkpoints

Download the Qwen2-VL-2B-Instruct checkpoints from Hugging Face or ModelScope.

3. Make GGUF Files

Refer to the transformation provided by @bartowski1182 or create them yourself.

4. Convert Checkpoints to GGUF

python convert_hf_to_gguf.py ./Qwen2-VL-2B-Instruct --outfile ./qwen2-vl-2b-instruct-q8_0.gguf --outtype q8_0

5. Fix Vision Encoder

python examples/llava/qwen2_vl_surgery.py ~/Desktop/code/Qwen2-VL-2B-Instruct

Run the Model

./build/bin/llama-qwen2vl-cli -m ./qwen2-vl-2b-instruct-q8_0.gguf --mmproj qwen2-vl-2b-instruct-vision.gguf -p 'Describe the image' --image ~/Desktop/logo.png

Example Output

"The image depicts a stylized, abstract design that resembles a hexagonal shape with a blue and white gradient. The hexagon has a star-like appearance, with a series of triangles connecting its sides, creating a sense of symmetry and balance. The design has a modern and futuristic feel, with clean lines and a monochromatic color palette. The gradient effect adds a dynamic element, making the design look three-dimensional and slightly warped, as if it's floating in space."


This guide provides a clear and concise steps to set up and run Qwen2-VL through llama.cpp on a Mac M3. Enjoy exploring the capabilities of the model! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment