Skip to content

Instantly share code, notes, and snippets.

View maxidl's full-sized avatar

Max Idahl maxidl

  • Hanover, Germany
View GitHub Profile
import argparse
import copy
import torch
import datasets as hfds
import transformers
from tqdm.auto import tqdm
import wandb
@maxidl
maxidl / openai_pyarrow_schemas.py
Created June 16, 2025 16:10
PyArrow Schemas for OpenAI Completion and ChatCompletion
"""
OpenAI Python SDK Compatible Schemas.
Compatible with v1.86.0. Subject to change.
See https://github.com/openai/openai-python for the latest version.
"""
import pyarrow as pa
COMPLETION_SCHEMA = pa.schema([
pa.field('id', pa.string()),
@maxidl
maxidl / fp8_cast_bf16.py
Created July 14, 2025 12:48
A version of https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py that uses cpu instead of gpu memory to load and save dequantized weights. Only the dequantization step itself is executed on gpu, with much smaller memory footprint compared to the original script. Runtime is longer, but this enables conversion of fp…
import os
import json
from argparse import ArgumentParser
from glob import glob
from tqdm import tqdm
import torch
from safetensors.torch import load_file, save_file
from kernel import weight_dequant
@maxidl
maxidl / pytorch-nccl-test.sh
Last active August 5, 2025 07:55
SLURM PyTorch NCCL Multi-Node Test Script: A SLURM batch script that tests PyTorch's NCCL functionality across multiple GPU nodes. The script sets up a distributed PyTorch environment using torchrun and runs a comprehensive test that verifies NCCL initialization, inter-process communication barriers, and proper cleanup. Includes diagnostic outpu…
#!/bin/bash
#SBATCH --job-name=pytorch-nccl-test
#SBATCH --partition=
#SBATCH --account=
#SBATCH --qos=
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --gres=gpu:H100:4
#SBATCH --time 0:05:00
@maxidl
maxidl / shard_parquet_dataset.py
Created August 28, 2025 14:57
re-shard parquet dataset
from pathlib import Path
import pyarrow.parquet as pq
import pyarrow.dataset as pads
from tqdm.auto import tqdm
d_in = Path("path-to-input-dataset")
d_out = Path("path-to-write-sharded-dataset-to")