Skip to content

Instantly share code, notes, and snippets.

View muellerzr's full-sized avatar

Zach Mueller muellerzr

View GitHub Profile
import torch.nn as nn
from datasets import load_dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
DataCollatorForLanguageModeling,
Trainer,
TrainingArguments,
set_seed,
)
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
enable_cpu_affinity: false
fsdp_config:
fsdp_activation_checkpointing: false
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_backward_prefetch: BACKWARD_PRE
fsdp_cpu_ram_efficient_loading: true
@muellerzr
muellerzr / test.py
Created July 3, 2024 20:38
Model loading speed test
import time
from transformers import AutoTokenizer, LlamaForCausalLM
from accelerate.utils import set_seed
set_seed(42)
file_size = 132 # 70B
# file_size = 30 # 8B
start_time = time.time()
@muellerzr
muellerzr / deploy.yml
Created May 30, 2024 16:40
Password protection on static gh site
name: Deploy to GitHub Pages
on:
push:
branches: [ "main", "master" ]
workflow_dispatch:
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
@muellerzr
muellerzr / base_drivers.txt
Created April 15, 2024 17:59
P2P tests with 4090's
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, NVIDIA GeForce RTX 4090, pciBusID: 1, pciDeviceID: 0, pciDomainID:0
Device: 1, NVIDIA GeForce RTX 4090, pciBusID: 2, pciDeviceID: 0, pciDomainID:0
Device=0 CANNOT Access Peer Device=1
Device=1 CANNOT Access Peer Device=0
***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.
P2P Connectivity Matrix
import builtins
import fcntl
import os
import socket
import torch
import torch.distributed as dist
print("STARTED")
def print(*args, **kwargs):
@muellerzr
muellerzr / test.py
Created September 15, 2023 18:29
Model memory stuff
import torch
from transformers import AutoModel, AutoConfig, AutoModelForSequenceClassification
def get_model_memory(model: torch.nn.Module):
"""
Returns the memory usage of the given model
"""
total_memory = 0
for param in model.parameters():
total_memory += param.numel() * param.element_size()
@muellerzr
muellerzr / hide_sidebar.js
Created June 8, 2023 22:42
Javascript which will hide semantically-versioned sidebars in Quarto. Designed to be used in conjunction with nbquarto/referenced from it
/**
* Enables semantic versioning through careful sidebar menu item selection.
* Hide sidebar menu items that are not related to the current page that is open.
* Assumes a directory structure of:
* - version_1
* - page_1
* - page_2
* - version_2
* - page_2
* - page_3