Skip to content

Instantly share code, notes, and snippets.

@Quentin-Anthony
Quentin-Anthony / transformer_mem.py
Last active June 20, 2025 13:09
Transformer Training/Inference Memory
import argparse
import math
# Helper function to pretty-print message sizes
def convert_params(params):
if params == 0:
return "0"
size_name = ("", "K", "M", "B", "T", "P", "E", "Z", "Y")
i = int(math.floor(math.log(params, 1000)))
p = math.pow(1000, i)
@Quentin-Anthony
Quentin-Anthony / test_gpu.sh
Last active October 16, 2022 17:14
EFA BW test for Stability cluster (adapted from Azure script)
#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --job-name=gputest
#SBATCH --nodes 1
#SBATCH --ntasks-per-node 8
#SBATCH --cpus-per-gpu=6
#SBATCH --gres=gpu:8
#SBATCH --nodelist gpu-st-p4d-24xlarge-42
#SBATCH --output=%x_%j.out
#SBATCH --open-mode=append
@Quentin-Anthony
Quentin-Anthony / megatron-checks.md
Last active June 7, 2022 05:30
Megatron Spreadsheet
ZeRO Stage Data-Parallel MP PP MP+PP MoE MoE+MP
1
2 N/A N/A
3 N/A N/A N/A N/A