Skip to content

Instantly share code, notes, and snippets.

@denguir
denguir / cuda_install.md
Last active April 20, 2025 17:34
Installation procedure for CUDA / cuDNN / TensorRT

How to install CUDA / cuDNN / TensorRT on Ubuntu

Install NVIDIA drivers

Update & upgrade

sudo apt update && sudo apt upgrade

Remove previous NVIDIA installation

@mingfeima
mingfeima / pytorch_performance_profiling.md
Last active April 11, 2025 15:38
How to do performance profiling on PyTorch

(Internal Tranining Material)

Usually the first step in performance optimization is to do profiling, e.g. to identify performance hotspots of a workload. This gist tells basic knowledge of performance profiling on PyTorch, you will get:

  • How to find the bottleneck operator?
  • How to trace source file of a particular operator?
  • How do I indentify threading issues? (oversubscription)
  • How do I tell a specific operator is running efficiently or not?

This tutorial takes one of my recent projects - pssp-transformer as an example to guide you through path of PyTorch CPU peformance optimization. Focus will be on Part 1 & Part 2.