Skip to content

Instantly share code, notes, and snippets.

@wanchaol
Created April 10, 2020 22:15
Show Gist options
  • Save wanchaol/6e0c52720313047f67789dc447f53ed0 to your computer and use it in GitHub Desktop.
Save wanchaol/6e0c52720313047f67789dc447f53ed0 to your computer and use it in GitHub Desktop.
Perf profile before https://github.com/pytorch/pytorch/pull/33157
-------------------------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls
-------------------------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
add_ 94.03% 2223.890s 94.03% 2223.890s 55.625ms 39980
AddmmBackward 0.00% 79.386ms 2.57% 60.762s 10.127ms 6000
mm 2.56% 60.601s 2.56% 60.601s 5.509ms 11000
EmbeddingBagBackward 0.00% 82.598ms 1.05% 24.881s 3.110ms 8000
_embedding_bag_backward 0.01% 139.080ms 1.05% 24.798s 3.100ms 8000
torch::autograd::AccumulateGrad 0.00% 101.454ms 0.99% 23.330s 1.167ms 20000
addmm 0.87% 20.636s 0.87% 20.636s 3.439ms 6000
_embedding_bag_sparse_backward 0.00% 37.911ms 0.72% 17.141s 2.143ms 8000
index_select 0.71% 16.862s 0.71% 16.862s 2.108ms 8000
embedding_bag 0.00% 30.778ms 0.62% 14.636s 1.830ms 8000
_embedding_bag 0.62% 14.595s 0.62% 14.595s 1.824ms 8000
zero_ 0.26% 6.038s 0.26% 6.038s 201.388us 29980
zeros 0.00% 35.091ms 0.21% 5.054s 505.358us 10000
sum 0.20% 4.639s 0.20% 4.639s 773.195us 6000
ReluBackward0 0.00% 27.057ms 0.18% 4.242s 848.435us 5000
threshold_backward 0.18% 4.215s 0.18% 4.215s 843.023us 5000
relu 0.10% 2.482s 0.10% 2.482s 496.443us 5000
cat 0.10% 2.345s 0.10% 2.345s 1.173ms 2000
bmm 0.08% 1.778s 0.08% 1.778s 592.766us 3000
index_add_ 0.06% 1.494s 0.06% 1.494s 186.718us 8000
BmmBackward 0.00% 11.588ms 0.06% 1.370s 1.370ms 1000
cumsum 0.05% 1.291s 0.05% 1.291s 161.397us 8000
add 0.04% 912.725ms 0.04% 912.725ms 456.363us 2000
IndexBackward 0.00% 10.067ms 0.03% 732.134ms 732.134us 1000
to 0.03% 612.052ms 0.03% 626.497ms 25.060us 25000
SliceBackward 0.01% 228.825ms 0.02% 466.571ms 466.571us 1000
_index_put_impl_ 0.02% 398.139ms 0.02% 398.139ms 398.139us 1000
index 0.01% 309.317ms 0.01% 309.317ms 309.317us 1000
embedding_sparse_backward 0.00% 49.982ms 0.01% 241.226ms 30.153us 8000
unsigned short 0.01% 166.909ms 0.01% 166.909ms 5.755us 29000
ones_like 0.00% 29.499ms 0.00% 114.266ms 14.283us 8000
_sparse_coo_tensor_unsafe 0.00% 22.789ms 0.00% 113.978ms 14.247us 8000
view 0.00% 92.456ms 0.00% 92.456ms 3.852us 24000
_sparse_coo_tensor_with_dims_and_tensors 0.00% 91.238ms 0.00% 91.238ms 11.393us 8008
empty 0.00% 90.342ms 0.00% 90.342ms 5.019us 18000
sub_ 0.00% 83.282ms 0.00% 83.282ms 10.410us 8000
reshape 0.00% 31.194ms 0.00% 83.104ms 4.888us 17000
select 0.00% 77.568ms 0.00% 77.568ms 4.848us 16000
mse_loss 0.00% 62.886ms 0.00% 62.886ms 62.886us 1000
stack 0.00% 57.713ms 0.00% 57.713ms 57.713us 1000
MseLossBackward 0.00% 12.593ms 0.00% 47.925ms 47.925us 1000
empty_like 0.00% 17.190ms 0.00% 46.238ms 5.780us 8000
CatBackward 0.00% 11.530ms 0.00% 44.988ms 22.494us 2000
detach_ 0.00% 43.878ms 0.00% 43.878ms 1.097us 39980
fill_ 0.00% 38.529ms 0.00% 38.529ms 4.816us 8000
mse_loss_backward 0.00% 35.331ms 0.00% 35.331ms 35.331us 1000
slice 0.00% 34.776ms 0.00% 34.776ms 2.675us 13000
narrow 0.00% 10.767ms 0.00% 33.459ms 3.042us 11000
TBackward 0.00% 10.741ms 0.00% 29.931ms 4.988us 6000
sigmoid 0.00% 26.957ms 0.00% 26.957ms 26.957us 1000
SigmoidBackward 0.00% 8.044ms 0.00% 19.202ms 19.202us 1000
transpose 0.00% 17.920ms 0.00% 17.920ms 4.480us 4000
empty_strided 0.00% 14.445ms 0.00% 14.445ms 7.223us 2000
sigmoid_backward 0.00% 11.158ms 0.00% 11.158ms 11.158us 1000
contiguous 0.00% 10.886ms 0.00% 10.886ms 0.680us 16000
ViewBackward 0.00% 2.693ms 0.00% 8.531ms 8.531us 1000
TransposeBackward0 0.00% 1.635ms 0.00% 4.842ms 4.842us 1000
detach 0.00% 3.636ms 0.00% 3.636ms 1.207us 3012
broadcast_tensors 0.00% 2.246ms 0.00% 2.246ms 2.246us 1000
sparse_coo_tensor 0.00% 61.901us 0.00% 2.133ms 266.632us 8
torch::autograd::GraphRoot 0.00% 1.522ms 0.00% 1.522ms 1.522us 1000
min 0.00% 1.333ms 0.00% 1.333ms 166.633us 8
max 0.00% 689.010us 0.00% 689.010us 86.126us 8
_indices 0.00% 21.215us 0.00% 21.215us 0.884us 24
_values 0.00% 13.336us 0.00% 13.336us 0.556us 24
item 0.00% 10.685us 0.00% 13.022us 13.022us 1
random_ 0.00% 10.071us 0.00% 10.071us 10.071us 1
_local_scalar_dense 0.00% 2.337us 0.00% 2.337us 2.337us 1
is_floating_point 0.00% 0.552us 0.00% 0.552us 0.552us 1
is_complex 0.00% 0.369us 0.00% 0.369us 0.369us 1
-------------------------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
Self CPU time total: 2365.057s
Perf profile after https://github.com/pytorch/pytorch/pull/33157:
-------------------------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls
-------------------------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
add_ 96.13% 2241.779s 96.13% 2241.779s 56.073ms 39980
AddmmBackward 0.00% 83.053ms 1.28% 29.880s 4.980ms 6000
mm 1.27% 29.718s 1.27% 29.718s 2.702ms 11000
addmm 0.75% 17.411s 0.75% 17.411s 2.902ms 6000
EmbeddingBagBackward 0.00% 88.773ms 0.71% 16.554s 2.069ms 8000
_embedding_bag_backward 0.01% 151.007ms 0.71% 16.466s 2.058ms 8000
torch::autograd::AccumulateGrad 0.00% 103.184ms 0.65% 15.163s 758.159us 20000
embedding_bag 0.00% 29.610ms 0.52% 12.065s 1.508ms 8000
_embedding_bag 0.52% 12.026s 0.52% 12.026s 1.503ms 8000
_embedding_bag_sparse_backward 0.00% 42.842ms 0.45% 10.478s 1.310ms 8000
index_select 0.44% 10.165s 0.44% 10.165s 1.271ms 8000
zero_ 0.15% 3.461s 0.15% 3.461s 115.430us 29980
cat 0.12% 2.882s 0.12% 2.882s 1.441ms 2000
ReluBackward0 0.00% 27.909ms 0.11% 2.602s 520.317us 5000
threshold_backward 0.11% 2.574s 0.11% 2.574s 514.735us 5000
zeros 0.00% 33.299ms 0.11% 2.543s 254.289us 10000
relu 0.10% 2.323s 0.10% 2.323s 464.657us 5000
index_add_ 0.08% 1.801s 0.08% 1.801s 225.146us 8000
sum 0.07% 1.732s 0.07% 1.732s 288.604us 6000
cumsum 0.06% 1.462s 0.06% 1.462s 182.761us 8000
bmm 0.05% 1.110s 0.05% 1.110s 370.161us 3000
BmmBackward 0.00% 11.765ms 0.03% 697.521ms 697.521us 1000
to 0.02% 580.078ms 0.03% 601.783ms 24.071us 25000
add 0.02% 480.874ms 0.02% 480.874ms 240.437us 2000
IndexBackward 0.00% 11.072ms 0.01% 331.966ms 331.966us 1000
index 0.01% 319.801ms 0.01% 319.801ms 319.801us 1000
embedding_sparse_backward 0.00% 56.627ms 0.01% 269.791ms 33.724us 8000
_index_put_impl_ 0.01% 199.158ms 0.01% 199.158ms 199.158us 1000
SliceBackward 0.00% 86.124ms 0.01% 195.881ms 195.881us 1000
unsigned short 0.01% 168.242ms 0.01% 168.242ms 5.801us 29000
_sparse_coo_tensor_unsafe 0.00% 25.341ms 0.01% 133.572ms 16.696us 8000
ones_like 0.00% 31.091ms 0.01% 121.629ms 15.204us 8000
empty 0.00% 113.119ms 0.00% 113.119ms 6.284us 18000
_sparse_coo_tensor_with_dims_and_tensors 0.00% 108.283ms 0.00% 108.283ms 13.522us 8008
sub_ 0.00% 91.500ms 0.00% 91.500ms 11.438us 8000
view 0.00% 90.028ms 0.00% 90.028ms 3.751us 24000
reshape 0.00% 31.463ms 0.00% 84.887ms 4.993us 17000
mse_loss 0.00% 83.360ms 0.00% 83.360ms 83.360us 1000
select 0.00% 80.995ms 0.00% 80.995ms 5.062us 16000
stack 0.00% 64.852ms 0.00% 64.852ms 64.852us 1000
empty_like 0.00% 19.635ms 0.00% 47.989ms 5.999us 8000
CatBackward 0.00% 12.000ms 0.00% 46.440ms 23.220us 2000
detach_ 0.00% 44.395ms 0.00% 44.395ms 1.110us 39980
fill_ 0.00% 42.549ms 0.00% 42.549ms 5.319us 8000
MseLossBackward 0.00% 8.435ms 0.00% 42.450ms 42.450us 1000
slice 0.00% 35.131ms 0.00% 35.131ms 2.702us 13000
narrow 0.00% 11.908ms 0.00% 34.440ms 3.131us 11000
mse_loss_backward 0.00% 34.015ms 0.00% 34.015ms 34.015us 1000
sigmoid 0.00% 30.751ms 0.00% 30.751ms 30.751us 1000
TBackward 0.00% 10.631ms 0.00% 29.283ms 4.880us 6000
empty_strided 0.00% 21.704ms 0.00% 21.704ms 10.852us 2000
transpose 0.00% 16.880ms 0.00% 16.880ms 4.220us 4000
SigmoidBackward 0.00% 4.448ms 0.00% 14.362ms 14.362us 1000
sigmoid_backward 0.00% 9.914ms 0.00% 9.914ms 9.914us 1000
contiguous 0.00% 9.827ms 0.00% 9.827ms 0.614us 16000
ViewBackward 0.00% 3.176ms 0.00% 8.470ms 8.470us 1000
TransposeBackward0 0.00% 1.621ms 0.00% 4.575ms 4.575us 1000
detach 0.00% 3.480ms 0.00% 3.480ms 1.155us 3012
broadcast_tensors 0.00% 2.333ms 0.00% 2.333ms 2.333us 1000
sparse_coo_tensor 0.00% 60.943us 0.00% 1.827ms 228.399us 8
min 0.00% 995.529us 0.00% 995.529us 124.441us 8
torch::autograd::GraphRoot 0.00% 933.194us 0.00% 933.194us 0.933us 1000
max 0.00% 718.981us 0.00% 718.981us 89.873us 8
_values 0.00% 26.226us 0.00% 26.226us 1.093us 24
_indices 0.00% 18.810us 0.00% 18.810us 0.784us 24
item 0.00% 13.919us 0.00% 16.531us 16.531us 1
random_ 0.00% 11.755us 0.00% 11.755us 11.755us 1
_local_scalar_dense 0.00% 2.612us 0.00% 2.612us 2.612us 1
is_floating_point 0.00% 0.813us 0.00% 0.813us 0.813us 1
is_complex 0.00% 0.389us 0.00% 0.389us 0.389us 1
-------------------------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
Self CPU time total: 2331.962s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment