Created
February 9, 2025 01:38
-
-
Save brucevanhorn2/6f450ae9728109246876e698457601fa to your computer and use it in GitHub Desktop.
LLM on your computer
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from transformers import AutoModelForCausalLM, AutoTokenizer | |
import torch | |
model_name = "meta-llama/Llama-2-13b-chat-hf" | |
tokenizer = AutoTokenizer.from_pretrained(model_name) | |
model = AutoModelForCausalLM.from_pretrained( | |
model_name, | |
device_map="auto", | |
load_in_8bit=True, # Enable 8-bit quantization to save VRAM | |
torch_dtype=torch.float16 | |
) | |
# model_name = "tiiuae/falcon-7b-instruct" | |
# tokenizer = AutoTokenizer.from_pretrained(model_name) | |
# model = AutoModelForCausalLM.from_pretrained( | |
# model_name, | |
# device_map="auto", | |
# torch_dtype=torch.float16 | |
# ) | |
inputs = tokenizer("Explain what a data warehouse is.", return_tensors="pt").to("cuda") | |
outputs = model.generate(inputs.input_ids, max_new_tokens=200) | |
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
accelerate==1.2.1 | |
certifi==2024.12.14 | |
charset-normalizer==3.4.1 | |
colorama==0.4.6 | |
filelock==3.16.1 | |
fsspec==2024.12.0 | |
huggingface-hub==0.27.1 | |
idna==3.10 | |
Jinja2==3.1.5 | |
MarkupSafe==3.0.2 | |
mpmath==1.3.0 | |
networkx==3.4.2 | |
numpy==2.2.1 | |
packaging==24.2 | |
psutil==6.1.1 | |
PyYAML==6.0.2 | |
regex==2024.11.6 | |
requests==2.32.3 | |
safetensors==0.5.2 | |
setuptools==75.8.0 | |
sympy==1.13.1 | |
tokenizers==0.21.0 | |
torch==2.5.1 | |
tqdm==4.67.1 | |
transformers==4.48.0 | |
typing_extensions==4.12.2 | |
urllib3==2.3.0 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The commented out code for the falcon model requires a decent GPU in order to run. The llama2 model that's in there requires a very good GPU. I quantized it to 16 bit to save space. If you have a terrific GPU (RTX 4090 or maybe a 4060ti with 16GB) you can take that part out.