How to profile your pytorch codes

Inside profiler

import torch
import torchvision.models as models

model = models.densenet121(pretrained=True)
x = torch.randn((1, 3, 224, 224), requires_grad=True)

with torch.autograd.profiler.profile(use_cuda=True) as prof:
    model(x)
print(prof)

The result is something like this,

-----------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------
Name                                        CPU time        CUDA time            Calls        CPU total       CUDA total
-----------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------
conv2d                                    9976.544us       9972.736us                1       9976.544us       9972.736us
convolution                               9958.778us       9958.400us                1       9958.778us       9958.400us
_convolution                              9946.712us       9947.136us                1       9946.712us       9947.136us
contiguous                                   6.692us          6.976us                1          6.692us          6.976us
empty                                       11.927us         12.032us                1         11.927us         12.032us
mkldnn_convolution                        9880.452us       9889.792us                1       9880.452us       9889.792us
batch_norm                                1214.791us       1213.440us                1       1214.791us       1213.440us
native_batch_norm                         1190.496us       1193.056us                1       1190.496us       1193.056us
threshold_                                 158.258us        159.584us                1        158.258us        159.584us
max_pool2d_with_indices                  28837.682us      28836.834us                1      28837.682us      28836.834us
max_pool2d_with_indices_forward          28813.804us      28822.530us                1      28813.804us      28822.530us
batch_norm                                1780.373us       1778.690us                1       1780.373us       1778.690us
native_batch_norm                         1756.774us       1759.327us                1       1756.774us       1759.327us
threshold_                                  64.665us         66.368us                1         64.665us         66.368us
conv2d                                    6103.544us       6102.142us                1       6103.544us       6102.142us
convolution                               6089.946us       6089.600us                1       6089.946us       6089.600us
_convolution                              6076.506us       6076.416us                1       6076.506us       6076.416us
contiguous                                   7.306us          7.938us                1          7.306us          7.938us
empty                                        9.037us          8.194us                1          9.037us          8.194us
mkldnn_convolution                        6015.653us       6021.408us                1       6015.653us       6021.408us
batch_norm                                 700.129us        699.394us

You may find more details here

Inside Bottleneck

link

Python profiler

line_profiler

XinDongol/profile_pyt.md

Select an option

No results found

Select an option

No results found

Inside profiler

Inside Bottleneck

Python profiler

ald2004 commented Sep 23, 2020

Uh oh!

MitchellX commented Sep 23, 2020

Uh oh!

nightlessbaron commented Oct 6, 2020

Uh oh!

brando90 commented Nov 15, 2020

Uh oh!