Created
April 10, 2020 22:15
-
-
Save wanchaol/6e0c52720313047f67789dc447f53ed0 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Perf profile before https://github.com/pytorch/pytorch/pull/33157 | |
-------------------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- | |
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls | |
-------------------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- | |
add_ 94.03% 2223.890s 94.03% 2223.890s 55.625ms 39980 | |
AddmmBackward 0.00% 79.386ms 2.57% 60.762s 10.127ms 6000 | |
mm 2.56% 60.601s 2.56% 60.601s 5.509ms 11000 | |
EmbeddingBagBackward 0.00% 82.598ms 1.05% 24.881s 3.110ms 8000 | |
_embedding_bag_backward 0.01% 139.080ms 1.05% 24.798s 3.100ms 8000 | |
torch::autograd::AccumulateGrad 0.00% 101.454ms 0.99% 23.330s 1.167ms 20000 | |
addmm 0.87% 20.636s 0.87% 20.636s 3.439ms 6000 | |
_embedding_bag_sparse_backward 0.00% 37.911ms 0.72% 17.141s 2.143ms 8000 | |
index_select 0.71% 16.862s 0.71% 16.862s 2.108ms 8000 | |
embedding_bag 0.00% 30.778ms 0.62% 14.636s 1.830ms 8000 | |
_embedding_bag 0.62% 14.595s 0.62% 14.595s 1.824ms 8000 | |
zero_ 0.26% 6.038s 0.26% 6.038s 201.388us 29980 | |
zeros 0.00% 35.091ms 0.21% 5.054s 505.358us 10000 | |
sum 0.20% 4.639s 0.20% 4.639s 773.195us 6000 | |
ReluBackward0 0.00% 27.057ms 0.18% 4.242s 848.435us 5000 | |
threshold_backward 0.18% 4.215s 0.18% 4.215s 843.023us 5000 | |
relu 0.10% 2.482s 0.10% 2.482s 496.443us 5000 | |
cat 0.10% 2.345s 0.10% 2.345s 1.173ms 2000 | |
bmm 0.08% 1.778s 0.08% 1.778s 592.766us 3000 | |
index_add_ 0.06% 1.494s 0.06% 1.494s 186.718us 8000 | |
BmmBackward 0.00% 11.588ms 0.06% 1.370s 1.370ms 1000 | |
cumsum 0.05% 1.291s 0.05% 1.291s 161.397us 8000 | |
add 0.04% 912.725ms 0.04% 912.725ms 456.363us 2000 | |
IndexBackward 0.00% 10.067ms 0.03% 732.134ms 732.134us 1000 | |
to 0.03% 612.052ms 0.03% 626.497ms 25.060us 25000 | |
SliceBackward 0.01% 228.825ms 0.02% 466.571ms 466.571us 1000 | |
_index_put_impl_ 0.02% 398.139ms 0.02% 398.139ms 398.139us 1000 | |
index 0.01% 309.317ms 0.01% 309.317ms 309.317us 1000 | |
embedding_sparse_backward 0.00% 49.982ms 0.01% 241.226ms 30.153us 8000 | |
unsigned short 0.01% 166.909ms 0.01% 166.909ms 5.755us 29000 | |
ones_like 0.00% 29.499ms 0.00% 114.266ms 14.283us 8000 | |
_sparse_coo_tensor_unsafe 0.00% 22.789ms 0.00% 113.978ms 14.247us 8000 | |
view 0.00% 92.456ms 0.00% 92.456ms 3.852us 24000 | |
_sparse_coo_tensor_with_dims_and_tensors 0.00% 91.238ms 0.00% 91.238ms 11.393us 8008 | |
empty 0.00% 90.342ms 0.00% 90.342ms 5.019us 18000 | |
sub_ 0.00% 83.282ms 0.00% 83.282ms 10.410us 8000 | |
reshape 0.00% 31.194ms 0.00% 83.104ms 4.888us 17000 | |
select 0.00% 77.568ms 0.00% 77.568ms 4.848us 16000 | |
mse_loss 0.00% 62.886ms 0.00% 62.886ms 62.886us 1000 | |
stack 0.00% 57.713ms 0.00% 57.713ms 57.713us 1000 | |
MseLossBackward 0.00% 12.593ms 0.00% 47.925ms 47.925us 1000 | |
empty_like 0.00% 17.190ms 0.00% 46.238ms 5.780us 8000 | |
CatBackward 0.00% 11.530ms 0.00% 44.988ms 22.494us 2000 | |
detach_ 0.00% 43.878ms 0.00% 43.878ms 1.097us 39980 | |
fill_ 0.00% 38.529ms 0.00% 38.529ms 4.816us 8000 | |
mse_loss_backward 0.00% 35.331ms 0.00% 35.331ms 35.331us 1000 | |
slice 0.00% 34.776ms 0.00% 34.776ms 2.675us 13000 | |
narrow 0.00% 10.767ms 0.00% 33.459ms 3.042us 11000 | |
TBackward 0.00% 10.741ms 0.00% 29.931ms 4.988us 6000 | |
sigmoid 0.00% 26.957ms 0.00% 26.957ms 26.957us 1000 | |
SigmoidBackward 0.00% 8.044ms 0.00% 19.202ms 19.202us 1000 | |
transpose 0.00% 17.920ms 0.00% 17.920ms 4.480us 4000 | |
empty_strided 0.00% 14.445ms 0.00% 14.445ms 7.223us 2000 | |
sigmoid_backward 0.00% 11.158ms 0.00% 11.158ms 11.158us 1000 | |
contiguous 0.00% 10.886ms 0.00% 10.886ms 0.680us 16000 | |
ViewBackward 0.00% 2.693ms 0.00% 8.531ms 8.531us 1000 | |
TransposeBackward0 0.00% 1.635ms 0.00% 4.842ms 4.842us 1000 | |
detach 0.00% 3.636ms 0.00% 3.636ms 1.207us 3012 | |
broadcast_tensors 0.00% 2.246ms 0.00% 2.246ms 2.246us 1000 | |
sparse_coo_tensor 0.00% 61.901us 0.00% 2.133ms 266.632us 8 | |
torch::autograd::GraphRoot 0.00% 1.522ms 0.00% 1.522ms 1.522us 1000 | |
min 0.00% 1.333ms 0.00% 1.333ms 166.633us 8 | |
max 0.00% 689.010us 0.00% 689.010us 86.126us 8 | |
_indices 0.00% 21.215us 0.00% 21.215us 0.884us 24 | |
_values 0.00% 13.336us 0.00% 13.336us 0.556us 24 | |
item 0.00% 10.685us 0.00% 13.022us 13.022us 1 | |
random_ 0.00% 10.071us 0.00% 10.071us 10.071us 1 | |
_local_scalar_dense 0.00% 2.337us 0.00% 2.337us 2.337us 1 | |
is_floating_point 0.00% 0.552us 0.00% 0.552us 0.552us 1 | |
is_complex 0.00% 0.369us 0.00% 0.369us 0.369us 1 | |
-------------------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- | |
Self CPU time total: 2365.057s | |
Perf profile after https://github.com/pytorch/pytorch/pull/33157: | |
-------------------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- | |
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls | |
-------------------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- | |
add_ 96.13% 2241.779s 96.13% 2241.779s 56.073ms 39980 | |
AddmmBackward 0.00% 83.053ms 1.28% 29.880s 4.980ms 6000 | |
mm 1.27% 29.718s 1.27% 29.718s 2.702ms 11000 | |
addmm 0.75% 17.411s 0.75% 17.411s 2.902ms 6000 | |
EmbeddingBagBackward 0.00% 88.773ms 0.71% 16.554s 2.069ms 8000 | |
_embedding_bag_backward 0.01% 151.007ms 0.71% 16.466s 2.058ms 8000 | |
torch::autograd::AccumulateGrad 0.00% 103.184ms 0.65% 15.163s 758.159us 20000 | |
embedding_bag 0.00% 29.610ms 0.52% 12.065s 1.508ms 8000 | |
_embedding_bag 0.52% 12.026s 0.52% 12.026s 1.503ms 8000 | |
_embedding_bag_sparse_backward 0.00% 42.842ms 0.45% 10.478s 1.310ms 8000 | |
index_select 0.44% 10.165s 0.44% 10.165s 1.271ms 8000 | |
zero_ 0.15% 3.461s 0.15% 3.461s 115.430us 29980 | |
cat 0.12% 2.882s 0.12% 2.882s 1.441ms 2000 | |
ReluBackward0 0.00% 27.909ms 0.11% 2.602s 520.317us 5000 | |
threshold_backward 0.11% 2.574s 0.11% 2.574s 514.735us 5000 | |
zeros 0.00% 33.299ms 0.11% 2.543s 254.289us 10000 | |
relu 0.10% 2.323s 0.10% 2.323s 464.657us 5000 | |
index_add_ 0.08% 1.801s 0.08% 1.801s 225.146us 8000 | |
sum 0.07% 1.732s 0.07% 1.732s 288.604us 6000 | |
cumsum 0.06% 1.462s 0.06% 1.462s 182.761us 8000 | |
bmm 0.05% 1.110s 0.05% 1.110s 370.161us 3000 | |
BmmBackward 0.00% 11.765ms 0.03% 697.521ms 697.521us 1000 | |
to 0.02% 580.078ms 0.03% 601.783ms 24.071us 25000 | |
add 0.02% 480.874ms 0.02% 480.874ms 240.437us 2000 | |
IndexBackward 0.00% 11.072ms 0.01% 331.966ms 331.966us 1000 | |
index 0.01% 319.801ms 0.01% 319.801ms 319.801us 1000 | |
embedding_sparse_backward 0.00% 56.627ms 0.01% 269.791ms 33.724us 8000 | |
_index_put_impl_ 0.01% 199.158ms 0.01% 199.158ms 199.158us 1000 | |
SliceBackward 0.00% 86.124ms 0.01% 195.881ms 195.881us 1000 | |
unsigned short 0.01% 168.242ms 0.01% 168.242ms 5.801us 29000 | |
_sparse_coo_tensor_unsafe 0.00% 25.341ms 0.01% 133.572ms 16.696us 8000 | |
ones_like 0.00% 31.091ms 0.01% 121.629ms 15.204us 8000 | |
empty 0.00% 113.119ms 0.00% 113.119ms 6.284us 18000 | |
_sparse_coo_tensor_with_dims_and_tensors 0.00% 108.283ms 0.00% 108.283ms 13.522us 8008 | |
sub_ 0.00% 91.500ms 0.00% 91.500ms 11.438us 8000 | |
view 0.00% 90.028ms 0.00% 90.028ms 3.751us 24000 | |
reshape 0.00% 31.463ms 0.00% 84.887ms 4.993us 17000 | |
mse_loss 0.00% 83.360ms 0.00% 83.360ms 83.360us 1000 | |
select 0.00% 80.995ms 0.00% 80.995ms 5.062us 16000 | |
stack 0.00% 64.852ms 0.00% 64.852ms 64.852us 1000 | |
empty_like 0.00% 19.635ms 0.00% 47.989ms 5.999us 8000 | |
CatBackward 0.00% 12.000ms 0.00% 46.440ms 23.220us 2000 | |
detach_ 0.00% 44.395ms 0.00% 44.395ms 1.110us 39980 | |
fill_ 0.00% 42.549ms 0.00% 42.549ms 5.319us 8000 | |
MseLossBackward 0.00% 8.435ms 0.00% 42.450ms 42.450us 1000 | |
slice 0.00% 35.131ms 0.00% 35.131ms 2.702us 13000 | |
narrow 0.00% 11.908ms 0.00% 34.440ms 3.131us 11000 | |
mse_loss_backward 0.00% 34.015ms 0.00% 34.015ms 34.015us 1000 | |
sigmoid 0.00% 30.751ms 0.00% 30.751ms 30.751us 1000 | |
TBackward 0.00% 10.631ms 0.00% 29.283ms 4.880us 6000 | |
empty_strided 0.00% 21.704ms 0.00% 21.704ms 10.852us 2000 | |
transpose 0.00% 16.880ms 0.00% 16.880ms 4.220us 4000 | |
SigmoidBackward 0.00% 4.448ms 0.00% 14.362ms 14.362us 1000 | |
sigmoid_backward 0.00% 9.914ms 0.00% 9.914ms 9.914us 1000 | |
contiguous 0.00% 9.827ms 0.00% 9.827ms 0.614us 16000 | |
ViewBackward 0.00% 3.176ms 0.00% 8.470ms 8.470us 1000 | |
TransposeBackward0 0.00% 1.621ms 0.00% 4.575ms 4.575us 1000 | |
detach 0.00% 3.480ms 0.00% 3.480ms 1.155us 3012 | |
broadcast_tensors 0.00% 2.333ms 0.00% 2.333ms 2.333us 1000 | |
sparse_coo_tensor 0.00% 60.943us 0.00% 1.827ms 228.399us 8 | |
min 0.00% 995.529us 0.00% 995.529us 124.441us 8 | |
torch::autograd::GraphRoot 0.00% 933.194us 0.00% 933.194us 0.933us 1000 | |
max 0.00% 718.981us 0.00% 718.981us 89.873us 8 | |
_values 0.00% 26.226us 0.00% 26.226us 1.093us 24 | |
_indices 0.00% 18.810us 0.00% 18.810us 0.784us 24 | |
item 0.00% 13.919us 0.00% 16.531us 16.531us 1 | |
random_ 0.00% 11.755us 0.00% 11.755us 11.755us 1 | |
_local_scalar_dense 0.00% 2.612us 0.00% 2.612us 2.612us 1 | |
is_floating_point 0.00% 0.813us 0.00% 0.813us 0.813us 1 | |
is_complex 0.00% 0.389us 0.00% 0.389us 0.389us 1 | |
-------------------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- | |
Self CPU time total: 2331.962s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment