crazyguitar / RDMA_resources.md

Created March 22, 2024 04:31 — forked from aagontuk/RDMA_resources.md

RDMA Resources

crazyguitar / example.c

Created April 4, 2024 15:23 — forked from plebioda/example.c

libfabric example

	#include <rdma/fabric.h>
	#include <rdma/fabric.h>
	#include <rdma/fi_endpoint.h>
	#include <rdma/fi_cm.h>
	#include <rdma/fi_errno.h>
	#include <rdma/fi_rma.h>

	#include <pthread.h>
	#include <stdio.h>
	#include <stdlib.h>

crazyguitar / CUDA-12-1-1-pytorch.md

Created April 10, 2024 21:33 — forked from Birch-san/CUDA-12-1-1-pytorch.md

Installing CUDA 12.1.1 + PyTorch nightly + Python 3.10 on Ubuntu 22.10

Should you keep your NVIDIA driver?

CUDA 12.1.1 toolkit is gonna offer to install Nvidia driver 530 for us. It's from New Feature branch. It's likely to be newer than the default Nvidia driver you would've installed via apt-get (apt would prefer to give you 525, i.e. Production Branch).

If you're confident that you already have a new enough Nvidia driver for CUDA 12.1.1, and you'd like to keep your driver: feel free to skip this "uninstall driver" step.

But if you're not sure, or you know your driver is too old: let's uninstall it. CUDA will install a new driver for us later.

crazyguitar / commands.md

Created June 11, 2024 20:13 — forked from mcarilli/commands.md

Single- and multiprocess profiling workflow with nvprof and NVVP (Nsight Systems coming soon...)

Ordinary launch commands (no profiling):

Single-process:

python main_amp.py -a resnet50 --b 224 --deterministic --workers 4 --opt-level O1 ./bare_metal_train_val/

Multi-process:

python -m torch.distributed.launch  --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --deterministic --workers 4 --opt-level O1 ./bare_metal_train_val/

crazyguitar / bench.py

Created June 15, 2024 02:06 — forked from marians/bench.py

Benchmarking serialization/unserialization in python using json, pickle and cPickle

	import cPickle
	import pickle
	import json
	import random
	from time import time
	from hashlib import md5

	test_runs = 1000

	def float_list():

crazyguitar / nsight.sh

Created October 4, 2024 18:23 — forked from mcarilli/nsight.sh

Favorite nsight systems profiling commands for Pytorch scripts

	# This isn't supposed to run as a bash script, i named it with ".sh" for syntax highlighting.

	# https://developer.nvidia.com/nsight-systems
	# https://docs.nvidia.com/nsight-systems/profiling/index.html

	# My preferred nsys (command line executable used to create profiles) commands
	#
	# In your script, write
	# torch.cuda.nvtx.range_push("region name")
	# ...

crazyguitar / dag.py

Created December 25, 2024 00:13 — forked from OhadRubin/dag.py

	import networkx as nx
	from itertools import product
	"""
	When we compare this code with Airflow, the strengths of your code lie in its simplicity, lightweight nature, and the ability to easily integrate with existing Python code:

	Simplicity: This code provides a simple and straightforward way to model and work with DAGs without needing to go through the process of setting up and configuring a comprehensive system like Airflow. For smaller teams or projects with less complexity, this can be an advantage.

	Lightweight and easy to incorporate: Your code is a compact, single-file solution that can be easily integrated into an existing Python project without having to set up an entire Airflow environment. When your primary focus is on creating task dependencies with parameter combinations, rather than scheduling and monitoring, your code is easier to incorporate.

	Focused on task generation: Your code emphasizes creating a Cartesian product of tasks associated with nodes' parameters. It is geared towards tackling

crazyguitar / coro.cpp

Created January 8, 2025 07:23 — forked from Qix-/coro.cpp

C++20 coroutines + LibUV sample, v2

	// Thank you to the folks at the C++ slack channel,
	// along with @lewissbaker for the excellent literature
	// (even though it took me a few days to be convinced
	// it really was so).

	#include <uv.h>

	#include <iostream>
	#include <experimental/coroutine>

crazyguitar / MoE.py

Created October 4, 2025 04:00 — forked from ruvnet/MoE.py

A PyTorch implementation of a Mixture of Experts (MoE) model resembling the Mixtral 8x7B architecture, with detailed inline comments. This model combines transformer layers with an MoE layer consisting of 8 experts, aiming for high efficiency by activating only 2 experts per token. It's configured with dimensions reflecting the operational effic…

	"""
	This model integrates the MoE concept within a Transformer architecture. Each token's
	representation is processed by a subset of experts, determined by the gating mechanism.
	This architecture allows for efficient and specialized handling of different aspects of the
	data, aiming for the adaptability and efficiency noted in the Mixtral 8x7B model's design
	philosophy. The model activates only a fraction of the available experts for each token,
	significantly reducing the computational resources needed compared to activating all experts
	for all tokens.
	"""

CHANG-NING TSAI crazyguitar

Installing CUDA 12.1.1 + PyTorch nightly + Python 3.10 on Ubuntu 22.10

Should you keep your NVIDIA driver?

Ordinary launch commands (no profiling):