Skip to content

Instantly share code, notes, and snippets.

View beratcmn's full-sized avatar

Berat Çimen beratcmn

View GitHub Profile
@beratcmn
beratcmn / llama-home.md
Created May 14, 2023 16:27 — forked from rain-1/llama-home.md
How to run Llama 13B with a 6GB graphics card

This worked on 14/May/23. The instructions will probably require updating in the future.

llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)

It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

  • Clone llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.
    • git clone https://github.com/ggerganov/llama.cpp.git
  • cd llama.cpp
@beratcmn
beratcmn / merge_qlora_with_quantized_model.py
Created September 16, 2023 11:19 — forked from ChrisHayduk/merge_qlora_with_quantized_model.py
Merging QLoRA weights with quantized model
"""
The code below combines approaches published by both @eugene-yh and @jinyongyoo on Github.
Thanks for the contributions guys!
"""
import torch
import peft
@beratcmn
beratcmn / gemini-duckduckgo-search.py
Created March 20, 2024 14:28
Perplexity like real time search with Gemini 1.0 Pro and Duckduckgo API
"""
Requirements:
annotated-types==0.6.0
cachetools==5.3.3
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
colorama==0.4.6
curl-cffi==0.6.2