Skip to content

Instantly share code, notes, and snippets.

View Birch-san's full-sized avatar

Birch-san

View GitHub Profile
@Birch-san
Birch-san / 8bit_adam_memory_usage.md
Last active October 3, 2023 18:20
Unexplained memory usage of 8-bit AdamW (paged vs unpaged)

Some weird memory usage (VRAM) is reported (by torch and by NVML) when using 8-bit AdamW, paged or unpaged.

Here we train llama 2 on 4096-token sequences, using either --optim adamw_8bit or --optim paged_adamw_8bit.
We do a full finetune using qlora.py --full-finetune, with our qlora.py fork, stepwise branch, commit 9a1045d.
We print the memory usage using HF transformers trainer's on_step_end callback. This is after optimizer.step(); model.zero_grad().

One would expect the memory usage at the end of step 1 to be the same as the end of step 2.
Yet for unpaged optimizer: memory usage leaps by 13.2GiB. End of step 1=70.4GiB, end of step 2=81.6GiB.
This appears to be a leap in PyTorch reserved memory only (32.6GiB -> 43.9GiB).

@Birch-san
Birch-san / t5-small-weight-inits.py
Created October 1, 2023 15:04
google/t5-v1_1-small t5-small weight initializations
import torch
from transformers import T5ForConditionalGeneration
model: T5ForConditionalGeneration = T5ForConditionalGeneration.from_pretrained('google/t5-v1_1-small')
_inference_mode_context = torch._C._InferenceMode(True)
_inference_mode_context.__enter__()
model.shared.weight.std()
tensor(11.6375)
@Birch-san
Birch-san / local-copilot.md
Last active March 31, 2025 12:03
Running GitHub Copilot against local Code Llama model

Running GitHub Copilot VSCode extension against local Code Llama model

image

image

Tested on NVIDIA RTX 4090, but these instructions also cover AMD and Mac in case you wanna try those.
This guide assumes you are running Linux (I ran this on Ubuntu).

Before you get excited:

@Birch-san
Birch-san / mask-test.ipynb
Created September 2, 2023 15:32
Tester for neighbourhood_mask, perimeter_mask
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@Birch-san
Birch-san / mask_test.py
Created September 2, 2023 15:31
Tester for neighbourhood_mask, perimeter_mask
from typing import Optional, NamedTuple
from torch import BoolTensor, arange, meshgrid, clamp
import torch
class Dimensions(NamedTuple):
height: int
width: int
def make_neighbourhood_mask(size: Dimensions, size_orig: Dimensions, device='cpu') -> BoolTensor:
h, w = size
@Birch-san
Birch-san / llama_flash.py
Last active January 22, 2024 06:05
Loading llama with Flash Attention
from transformers import (
AutoConfig,
AutoTokenizer,
BitsAndBytesConfig,
GenerationConfig,
AutoModelForCausalLM,
LlamaTokenizerFast,
PreTrainedModel,
TextIteratorStreamer,
StoppingCriteria,
@Birch-san
Birch-san / arb.py
Created July 27, 2023 23:07
Computing aspect ratio buckets
import numpy as np
import math
from numpy.typing import NDArray
# we are trying to make buckets of varying aspect ratios,
# all with about the same area (equivalent to a 512x512 square)
square_side = 512
buckets = 8
widest_aspect: float = math.atan2(1, 2) # 1/2 = 0.5 aspect ratio
@Birch-san
Birch-san / flash_attn_processor.py
Created July 21, 2023 17:41
diffusers flash_attn AttnProcessors for qkvpacked self-attn and regular cross-attn
import torch
from typing import Optional
from flash_attn import flash_attn_func, flash_attn_qkvpacked_func
from diffusers.models.attention import Attention
class FlashAttnProcessor:
r"""
Processor for implementing memory efficient attention using flash_attn.
"""
@Birch-san
Birch-san / flash_attn_processor.py
Last active December 19, 2023 22:07
FlashAttnProcessor
import torch
from typing import Optional
from flash_attn import flash_attn_func
from diffusers.models.attention import Attention
class FlashAttnProcessor:
r"""
Processor for implementing memory efficient attention using flash_attn.
"""
@Birch-san
Birch-san / bnb-correctness-test.md
Last active July 10, 2023 17:43
Correctness-testing bitsandbytes `0.40.0`

correctness-testing 0.40.0

Here we've ramped up the bnb_4bit_compute_dtype to float32, in the hopes of making the model stay on-topic.
Since we were concerned by the responses measured with bnb_4bit_compute_dtype=bfloat16

llama 7b

`I was under the effect of a counterspell, so none of the superpower-wielding monsters could see me anyway. My eyes had begun to change as a result of my battle with Melvin. The transformation was complete. I was in the true look of my chosen form. As you can see, a true-blue beauty. There was only one of me, though, so I would have to make sure that this was the end. I went to catch the culprit. He was in the same clothes he was wearing when he committed the first murder. I did not recognize the man from that time, nor did he from me, but his face was twisted with an evil grin. He had the same shaved head. However, his hair seemed to change color. It was dark brown when I met him, but it turned to