Skip to content

Instantly share code, notes, and snippets.

View tallesairan's full-sized avatar
🏠
Working from home

Talles Airan tallesairan

🏠
Working from home
View GitHub Profile
@NaxAlpha
NaxAlpha / gpta.py
Created April 18, 2023 09:42
a custom gpt-like model that is tiny but can also scale context very long
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.backends.cuda as cuda
class NewGELU(nn.Module):
def forward(self, x):
@NaxAlpha
NaxAlpha / pythia_1b4_8k.py
Last active March 16, 2025 12:22
Fine-tune Pythia 1.4b model for 8k context window, this script requires at least 40 GB of memory, 15-20 hours of fine-tune is sufficient.
import copy
import torch
import torch.nn.functional as F
import torch.backends.cuda as cuda
from torch.utils.data import DataLoader, IterableDataset
import wandb
from tqdm import tqdm
import bitsandbytes as bnb