(Internal Tranining Material)
Usually the first step in performance optimization is to do profiling, e.g. to identify performance hotspots of a workload. This gist tells basic knowledge of performance profiling on PyTorch, you will get:
- How to find the bottleneck operator?
- How to trace source file of a particular operator?
- How do I indentify threading issues? (oversubscription)
- How do I tell a specific operator is running efficiently or not?
This tutorial takes one of my recent projects - pssp-transformer as an example to guide you through path of PyTorch CPU peformance optimization. Focus will be on Part 1 & Part 2.