Skip to content

Instantly share code, notes, and snippets.

View HusseinLezzaik's full-sized avatar
💫
dreaming

Hussein Lezzaik HusseinLezzaik

💫
dreaming
View GitHub Profile
@karpathy
karpathy / stablediffusionwalk.py
Last active April 5, 2026 22:47
hacky stablediffusion code for generating videos
"""
stable diffusion dreaming
creates hypnotic moving videos by smoothly walking randomly through the sample space
example way to run this script:
$ python stablediffusionwalk.py --prompt "blueberry spaghetti" --name blueberry
to stitch together the images, e.g.:
$ ffmpeg -r 10 -f image2 -s 512x512 -i blueberry/frame%06d.jpg -vcodec libx264 -crf 10 -pix_fmt yuv420p blueberry.mp4
@wiseman
wiseman / agent.py
Last active April 21, 2025 17:37
Langchain example: self-debugging
from io import StringIO
import sys
from typing import Dict, Optional
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents.tools import Tool
from langchain.llms import OpenAI
@danielgross
danielgross / mathpix2gpt.py
Last active March 18, 2025 02:18
mathpix2gpt.py
import requests
import time
import os
import sys
import openai
import tiktoken
from termcolor import colored
openai.api_key = open(os.path.expanduser('~/.openai')).read().strip()
@rain-1
rain-1 / llama-home.md
Last active March 1, 2026 16:35
How to run Llama 13B with a 6GB graphics card

This worked on 14/May/23. The instructions will probably require updating in the future.

llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)

Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.

It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

  • Clone llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.
@adrienbrault
adrienbrault / llama2-mac-gpu.sh
Last active April 8, 2025 13:49
Run Llama-2-13B-chat locally on your M1/M2 Mac with GPU inference. Uses 10GB RAM. UPDATE: see https://twitter.com/simonw/status/1691495807319674880?s=20
# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
# Build it
make clean
LLAMA_METAL=1 make
# Download model
export MODEL=llama-2-13b-chat.ggmlv3.q4_0.bin
@veekaybee
veekaybee / normcore-llm.md
Last active April 4, 2026 00:27
Normcore LLM Reads

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Screenshot 2023-12-18 at 10 40 27 PM

Pre-Transformer Models

@Birch-san
Birch-san / llama_flash.py
Last active January 22, 2024 06:05
Loading llama with Flash Attention
from transformers import (
AutoConfig,
AutoTokenizer,
BitsAndBytesConfig,
GenerationConfig,
AutoModelForCausalLM,
LlamaTokenizerFast,
PreTrainedModel,
TextIteratorStreamer,
StoppingCriteria,
@atiorh
atiorh / llm_activation_sparsity.py
Last active November 21, 2023 18:52
Activation Sparsity in LLMs
# !pip install transformers sentencepiece
import torch
import torch.nn as nn
torch.set_grad_enabled(False)
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.activations import ReLUSquaredActivation
from collections import defaultdict, OrderedDict