Kaiyu Shi Stonesjtu

🍏

Bad Apple

Machine Learning from Huge to Tiny Deep Learning from Python to RTL

Stonesjtu / nsight.sh

Created September 10, 2024 11:50 — forked from mcarilli/nsight.sh

Favorite nsight systems profiling commands for Pytorch scripts

	# This isn't supposed to run as a bash script, i named it with ".sh" for syntax highlighting.

	# https://developer.nvidia.com/nsight-systems
	# https://docs.nvidia.com/nsight-systems/profiling/index.html

	# My preferred nsys (command line executable used to create profiles) commands
	#
	# In your script, write
	# torch.cuda.nvtx.range_push("region name")
	# ...

Stonesjtu / tmux-cheats.md

Created October 11, 2019 06:39 — forked from Starefossen/tmux-cheats.md

My personal tmux cheat sheet for working with sessions, windows, and panes. `NB` I have remapped the command prefix to `ctrl` + `a`.

Sessions

Stonesjtu / latency.txt

Created June 15, 2019 22:05 — forked from understeer/latency.txt

HPC-oriented Latency Numbers Every Programmer Should Know

	Latency Comparison Numbers
	--------------------------
	L1 cache reference/hit 1.5 ns 4 cycles
	Floating-point add/mult/FMA operation 1.5 ns 4 cycles
	L2 cache reference/hit 5 ns 12 ~ 17 cycles
	Branch mispredict 6 ns 15 ~ 20 cycles
	L3 cache hit (unshared cache line) 16 ns 42 cycles
	L3 cache hit (shared line in another core) 25 ns 65 cycles
	Mutex lock/unlock 25 ns
	L3 cache hit (modified in another core) 29 ns 75 cycles

Stonesjtu / mpi4py_pycuda_demo.py

Created July 18, 2018 07:28 — forked from lebedov/mpi4py_pycuda_demo.py

Demo of how to pass GPU memory managed by pycuda to mpi4py.

	#!/usr/bin/env python

	"""
	Demo of how to pass GPU memory managed by pycuda to mpi4py.

	Notes
	-----
	This code can be used to perform peer-to-peer communication of data via
	NVIDIA's GPUDirect technology if mpi4py has been built against a
	CUDA-enabled MPI implementation.

Stonesjtu / viz_net_pytorch.py

Created January 26, 2018 06:54 — forked from wangg12/viz_net_pytorch.py

	from graphviz import Digraph
	import re
	import torch
	import torch.nn.functional as F
	from torch.autograd import Variable
	from torch.autograd import Variable
	import torchvision.models as models


	def make_dot(var):