Skip to content

Instantly share code, notes, and snippets.

View chenyaofo's full-sized avatar
😃
I may be slow to respond.

chenyaofo chenyaofo

😃
I may be slow to respond.
View GitHub Profile
@chenyaofo
chenyaofo / xui.json
Created September 24, 2024 05:24
X-UI
{
"api": {
"services": [
"HandlerService",
"LoggerService",
"StatsService"
],
"tag": "api"
},
"inbounds": [
@chenyaofo
chenyaofo / test.py
Created September 8, 2024 07:26
Differences between matrix and tensor with nn.Linear
import torch
b = 2
s = 4
h = 4
d = 4
device = "cuda"
dtype = torch.bfloat16
hidden_states = torch.rand((b,s, h*d), device=device, dtype=dtype)
@chenyaofo
chenyaofo / deepspeed-benchmark.md
Last active May 3, 2024 12:53
Throughput Benchmark based on deepspeed-based LLM traning code.

We training LLM with the code and report the training speed of different settings (see the Table). We use a machine with A800 x 8, 1 TB CPU memory, Intel 8358 CPU x 2. For the software, we use CUDA 12.1, PyTorch 2.2.0, Deepspeed 0.14.2.

Table. Benchmark of LLaMA-7B models using deepspeed-based traning code. The squence length is 4096.

Zero Stage Ckpt.[^1] Optim. Off.[^2] Param. Off.[^3] Zero++[^4] BS[^5] CPU Mem.[^6] GPU Mem.[^7] Th.put
2 × × × × 1/64 320.1 19.4/44.8 5.33
2 × × × 1/64 320.0 19.4/23.5 4.19
@chenyaofo
chenyaofo / download.py
Created December 13, 2023 08:28
Main script from kubeedge/sedna-storage-initializer:v0.3.0
#!/usr/bin/env python3
# Copyright 2021 The KubeEdge Authors.
# Copyright 2020 kubeflow.org.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
@chenyaofo
chenyaofo / serialize_transforms.py
Created September 7, 2023 11:11
serialize transforms into files.
import torch.package as package
import torch
import torchvision.transforms as T
def get_train_transforms(crop_size, mean, std, is_training):
pipelines = []
if is_training:
pipelines.append(T.RandomResizedCrop(crop_size))
pipelines.append(T.RandomHorizontalFlip())
@chenyaofo
chenyaofo / config.yaml
Created August 6, 2023 04:05
Cloudflare-Clash-Tunnel
port: 7890
socks-port: 7891
allow-lan: true
mode: Global
log-level: info
external-controller: :9090
profile:
store-selected: true
store-fake-ip: true
@chenyaofo
chenyaofo / llama-pipeline.py
Created June 19, 2023 01:28
LLaMA Pipeline Parallelism
import torch
import torch.nn.functional as F
from transformers.models.llama.modeling_llama import LlamaDecoderLayer, LlamaRMSNorm, LlamaConfig, LlamaForCausalLM
import deepspeed
from deepspeed.pipe import PipelineModule, LayerSpec
class EmbeddingPipe(torch.nn.Embedding):
def forward(self, args):
@chenyaofo
chenyaofo / engine.py
Created June 16, 2023 12:00
Deep Learning Engine
import pathlib
import loguru
import dataclasses
import deepspeed
import torch4x
from deepspeed import comm as dist
import pprint
@chenyaofo
chenyaofo / asyncio_read.py
Created May 31, 2023 09:37
Async reading multiple files.
import asyncio
import aiofiles
tar_filenames = [f"/home/chenyaofo/datasets/imagenet-wds/train/{i:06d}.tar" for i in range(256)]
# tar_filenames = [f"/gpfs01/home/chenyaofo/imagenet-wds/train/{i:06d}.tar" for i in range(256)]
count = 0
def async_reading():
print("asyncio reading based on naive asyncio")
@chenyaofo
chenyaofo / Dockerfile
Last active April 12, 2023 12:04
PyTorch 2,0 Docker
FROM nvidia/cuda:11.8.0-devel-ubuntu22.04
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8 \
PATH=/opt/conda/bin:$PATH \
PYTHON_VERSION=3.10
RUN APT_INSTALL="apt-get install -y --no-install-recommends --no-install-suggests" && \
GIT_CLONE="git clone --depth 10" && \
rm -rf /etc/apt/sources.list.d/cuda.list \