Skip to content

Instantly share code, notes, and snippets.

View mehdidc's full-sized avatar

Mehdi Cherti mehdidc

View GitHub Profile
@aluo-x
aluo-x / fairscale_demo.py
Created September 6, 2021 22:20
Basic demo of fairscale FSDP & OSS state_dict saving and loading
import torch
import torch.distributed as dist
import torch.nn as nn
import torch.multiprocessing as mp
from torch.nn.parallel import DistributedDataParallel as DDP
from fairscale.nn.data_parallel import ShardedDataParallel as ShardedDDP
from fairscale.optim.oss import OSS
from fairscale.nn.data_parallel import FullyShardedDataParallel as FSDP
import os
@rwightman
rwightman / bench_by_infer.csv
Created March 6, 2021 06:22
PyTorch Bench (1.8, 1.7.1, NGC 21.02, NGC 20.12)
model gpu env cl infer_samples_per_sec infer_step_time infer_batch_size train_samples_per_sec train_step_time train_batch_size param_count img_size
efficientnet_b0 rtx3090 ngc2102 True 7179.22 0.139 512 1628.51 0.609 256 5.29 224
efficientnet_b0 rtx3090 ngc2012 True 6527.77 0.153 512 1504.58 0.654 256 5.29 224
efficientnet_b0 v100_32 ngc2102 True 6496.56 0.154 512 1556.66 0.638 512 5.29 224
efficientnet_b0 rtx3090 1.7.1cu11.0 True 6020.3 0.166 512 1266.03 0.785 512 5.29 224
efficientnet_b0 rtx3090 1.8cu11.1 True 5979.7 0.167 512 1286.76 0.775 512 5.29 224
efficientnet_b0 v100_32 ngc2012 True 5666.05 0.176 512 1459.05 0.676 512 5.29 224
efficientnet_b0 v100_32 1.8cu11.1 True 5529.09 0.181 512 1444.02 0.688 512 5.29 224
efficientnet_b0 v100_32 1.7.1cu11.0 True 5526.07 0.181 512 1425.38 0.691 512 5.29 224
efficientnet_b0 titanrtx ngc2102 True 5118.38 0.195 512 1156.83 0.862 512 5.29 224

Hello,

When you look at a repo, be sure to use git log --all and look at the latest commits across all branches. You would see that my work is on the "dec6" branch, not the main branch.

In general, it's more reliable to DM me on twitter than to email me. I'll usually respond within a day. Emails are hit-or-miss.

Here is the critical difference:

@Eoin-ONeill-Yokai
Eoin-ONeill-Yokai / ffix-proton-install-and-mods-readme.md
Last active November 6, 2023 03:15
FINAL FANTASY IX on Linux / Steam Deck (Moguri Mod / Alternate Fantasy) Proton Installation Guide and Bug List

Document Revision 1.2 [07/15/22]

Author Notes: Thanks to everyone who has been testing or using this installation process. I've refined the instructions to try to make this installation as simple and cross platform as humanly possible. I've also updated the Steam Deck instructions now that I have had mine for a while and thoroughly tested the installation process.

ffix_manjaro-kde

Basic Game Installation

Final Fantasy IX should be installed like any standard steam game through the steam client. Ragarding compatibility layers: it should work with a stable release of Proton 7 (7.0.X recommended) through the steam client. This also includes controller support if you are using Steam's native controller configurations. If you have any problems with a given Proton release, I would also recommend trying the latest GloriousEggroll proton builds to se

#!/usr/bin/env bash
###
# NB: You probably don't want this gist any more.
# Instead, use this version from `fastsetup`:
# https://github.com/fastai/fastsetup/blob/master/setup-conda.sh
###
set -e
cd
@yk
yk / .vimrc
Created July 6, 2020 14:26
vimrc july 2020
set nocompatible " be iMproved, required
let g:python3_host_prog = '/usr/local/opt/[email protected]/bin/python3.8'
if empty(glob('~/.vim/autoload/plug.vim'))
silent !curl -fLo ~/.vim/autoload/plug.vim --create-dirs
\ https://raw.githubusercontent.com/junegunn/vim-plug/master/plug.vim
autocmd VimEnter * PlugInstall --sync | source $MYVIMRC
endif
call plug#begin('~/.vim/plugged')
@TengdaHan
TengdaHan / ddp_notes.md
Last active April 21, 2025 08:06
Multi-node-training on slurm with PyTorch

Multi-node-training on slurm with PyTorch

What's this?

  • A simple note for how to start multi-node-training on slurm scheduler with PyTorch.
  • Useful especially when scheduler is too busy that you cannot get multiple GPUs allocated, or you need more than 4 GPUs for a single job.
  • Requirement: Have to use PyTorch DistributedDataParallel(DDP) for this purpose.
  • Warning: might need to re-factor your own code.
  • Warning: might be secretly condemned by your colleagues because using too many GPUs.
@mingfeima
mingfeima / pytorch_performance_profiling.md
Last active April 11, 2025 15:38
How to do performance profiling on PyTorch

(Internal Tranining Material)

Usually the first step in performance optimization is to do profiling, e.g. to identify performance hotspots of a workload. This gist tells basic knowledge of performance profiling on PyTorch, you will get:

  • How to find the bottleneck operator?
  • How to trace source file of a particular operator?
  • How do I indentify threading issues? (oversubscription)
  • How do I tell a specific operator is running efficiently or not?

This tutorial takes one of my recent projects - pssp-transformer as an example to guide you through path of PyTorch CPU peformance optimization. Focus will be on Part 1 & Part 2.

@alexjc
alexjc / reading-list.rst
Last active April 26, 2025 07:01
Reading List on Texture Synthesis

Ubuntu 22.04 for Deep Learning

In the name of God

This gist contains steps to setup Ubuntu 22.04 for deep learning.


Install Ubuntu 22.04