🐢

Manuel Martinez manmartgarc

🐢

38 followers · 29 following

15:05 (UTC -07:00)
https://manmartgarc.github.io

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

TengdaHan / ddp_notes.md

Last active April 21, 2025 08:06

Multi-node-training on slurm with PyTorch

What's this?

A simple note for how to start multi-node-training on slurm scheduler with PyTorch.
Useful especially when scheduler is too busy that you cannot get multiple GPUs allocated, or you need more than 4 GPUs for a single job.
Requirement: Have to use PyTorch DistributedDataParallel(DDP) for this purpose.
Warning: might need to re-factor your own code.
Warning: might be secretly condemned by your colleagues because using too many GPUs.

mosquito / README.md

Last active May 21, 2025 19:56

Add doker-compose as a systemd unit

Docker compose as a systemd unit

Create file /etc/systemd/system/[email protected]. SystemD calling binaries using an absolute path. In my case is prefixed by /usr/local/bin, you should use paths specific for your environment.

[Unit]
Description=%i service with docker compose
PartOf=docker.service
After=docker.service

bsweger / useful_pandas_snippets.md

Last active May 28, 2025 23:07

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)