Skip to content

Instantly share code, notes, and snippets.

@mcarilli
mcarilli / nsight.sh
Last active January 7, 2026 08:00
Favorite nsight systems profiling commands for Pytorch scripts
# This isn't supposed to run as a bash script, i named it with ".sh" for syntax highlighting.
# https://developer.nvidia.com/nsight-systems
# https://docs.nvidia.com/nsight-systems/profiling/index.html
# My preferred nsys (command line executable used to create profiles) commands
#
# In your script, write
# torch.cuda.nvtx.range_push("region name")
# ...
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mingfeima
mingfeima / pytorch_performance_profiling.md
Last active April 11, 2025 15:38
How to do performance profiling on PyTorch

(Internal Tranining Material)

Usually the first step in performance optimization is to do profiling, e.g. to identify performance hotspots of a workload. This gist tells basic knowledge of performance profiling on PyTorch, you will get:

  • How to find the bottleneck operator?
  • How to trace source file of a particular operator?
  • How do I indentify threading issues? (oversubscription)
  • How do I tell a specific operator is running efficiently or not?

This tutorial takes one of my recent projects - pssp-transformer as an example to guide you through path of PyTorch CPU peformance optimization. Focus will be on Part 1 & Part 2.

@robbat2
robbat2 / CEPH-STATICSITES-HOWTO.md
Last active April 14, 2025 08:57
Ceph staticsites config RGW static website serving & SNI

Ceph StaticSites Configuration, with HAProxy & SNI

An instructional document by Robin H Johnson robin.johnson@dreamhost.com. I wrote much of the staticsites functionality of Ceph-RGW, during during late 2015 and early 2016, based on an early prototype by Yehuda Sadeh (yehudasa). It was written for usage at Dreamhost, but developed in the open for community improvement.

It is fully functional as of Jewel v10.2.3 plus PR11280 (ceph/ceph#11280). Prior to that, neither the non-CNAME nor CNAME-to-service modes will function correctly.

These configuration files represent how to quickly set up RGW+HAProxy for staticsite serving. I've tried to make them more readable, without leaving out too many details. You are strongly recommended to run a seperate RGW instance for staticsites, on a DIFFERENT outward-faciing IP than your normal instance (and in fact, certain functionality is not supported without it).

In place of using HAProxy, you could run the second rgw instance on port 80,

@wangruohui
wangruohui / Install NVIDIA Driver and CUDA.md
Last active January 24, 2026 19:55
Install NVIDIA Driver and CUDA on Ubuntu / CentOS / Fedora Linux OS