Skip to content

Instantly share code, notes, and snippets.

@Birch-san
Created September 19, 2022 18:38
Show Gist options
  • Select an option

  • Save Birch-san/a940e3d7541787aa7943bb310b146427 to your computer and use it in GitHub Desktop.

Select an option

Save Birch-san/a940e3d7541787aa7943bb310b146427 to your computer and use it in GitHub Desktop.
8 steps, Heun sampler, stable-diffusion, PyTorch nightly 1.13.0.dev20220917
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
job 0.27% 29.274ms 100.00% 10.976s 10.976s 1
iteration 0.00% 135.000us 98.68% 10.831s 10.831s 1
batch 0.92% 101.481ms 98.68% 10.831s 10.831s 1
kdiff_sample 0.00% 281.000us 93.19% 10.229s 10.229s 1
sample_heun::loop 0.00% 395.000us 93.17% 10.226s 10.226s 1
sample_heun::iter 0.01% 1.121ms 93.16% 10.225s 1.278s 8
KCFGDenoiser::forward 0.03% 3.084ms 92.19% 10.119s 674.578ms 15
KCFGDenoiser::self.inner_model+split 0.02% 1.693ms 70.75% 7.766s 517.714ms 15
kdiff_eps::forward 0.01% 1.209ms 70.72% 7.762s 517.500ms 15
kdiff_eps::get_eps*( 0.02% 2.027ms 70.27% 7.713s 514.180ms 15
LatentDiffusion::apply_model::self.model() 0.00% 545.000us 69.93% 7.675s 511.684ms 15
DiffusionWrapper::forward 0.00% 399.000us 69.92% 7.675s 511.645ms 15
DiffusionWrapper::forward::out = self.diffusion_mode... 0.01% 813.000us 69.90% 7.672s 511.456ms 15
UNetModel::forward 0.01% 1.520ms 69.89% 7.671s 511.401ms 15
TimestepEmbedSequential::forward 0.10% 10.778ms 68.16% 7.481s 19.949ms 375
CheckpointFunction 0.51% 55.749ms 61.85% 6.789s 11.910ms 570
TimestepEmbedSequential::forward::SpatialTransformer... 0.06% 6.104ms 53.94% 5.920s 24.667ms 240
SpatialTransformer::forward 0.20% 21.490ms 53.88% 5.914s 24.640ms 240
sample_heun::denoised=model() 0.00% 448.000us 51.77% 5.682s 710.284ms 8
SpatialTransformer::forward::for block in transforme... 0.05% 5.264ms 49.50% 5.433s 22.637ms 240
SpatialTransformer::forward::block() 0.60% 65.724ms 49.45% 5.427s 22.614ms 240
aten::copy_ 41.44% 4.548s 41.75% 4.583s 965.795us 4745
sample_heun::solve -0.00% -157.000us 40.68% 4.465s 558.139ms 8
sample_heun::heun 0.01% 614.000us 40.60% 4.456s 636.607ms 7
sample_heun::heun::denoised_2 = model() 0.01% 765.000us 40.45% 4.440s 634.294ms 7
UNetModel::forward::for module in self.output_blocks... 0.02% 2.649ms 39.18% 4.301s 286.708ms 15
UNetModel::forward::output_block 0.04% 4.505ms 39.15% 4.297s 23.875ms 180
UNetModel::forward::output_block module() 0.04% 4.582ms 38.83% 4.262s 23.676ms 180
CrossAttention::forward 0.54% 59.302ms 28.23% 3.099s 6.456ms 480
UNetModel::forward::for module in self.input_blocks 0.01% 1.492ms 26.09% 2.864s 190.930ms 15
UNetModel::forward::input_block 0.03% 3.321ms 26.08% 2.862s 15.901ms 180
UNetModel::forward::input_block module() 0.03% 2.932ms 26.04% 2.858s 15.876ms 180
aten::to 0.01% 1.319ms 24.17% 2.652s 1.431ms 1854
aten::_to_copy 0.01% 795.000us 24.16% 2.652s 26.260ms 101
KCFGDenoiser::weight_tensor 0.02% 1.693ms 20.84% 2.287s 152.457ms 15
aten::clone 0.58% 63.423ms 17.78% 1.951s 420.181us 4644
TimestepEmbedSequential::forward::TimestepBlock() 0.50% 55.192ms 13.51% 1.482s 4.492ms 330
ResBlock::forward 0.21% 23.445ms 12.84% 1.410s 4.272ms 330
aten::reshape -0.24% -26705.000us 12.83% 1.409s 186.741us 7543
CrossAttention::forward::qkv rearrange 0.53% 58.112ms 10.31% 1.131s 2.357ms 480
aten::mul 9.01% 988.592ms 9.08% 996.272ms 1.021ms 976
CrossAttention::forward::sim einsum 0.21% 23.167ms 8.36% 917.980ms 1.912ms 480
aten::conv2d -0.09% -9588.000us 7.83% 859.879ms 569.456us 1510
aten::convolution 0.39% 42.443ms 7.80% 856.271ms 567.067us 1510
aten::_convolution -0.01% -927.000us 7.76% 851.841ms 564.133us 1510
aten::einsum 0.20% 21.986ms 7.62% 835.988ms 870.821us 960
aten::bmm 6.59% 723.458ms 7.08% 777.342ms 769.646us 1010
aten::_mps_convolution 6.54% 717.311ms 6.55% 718.998ms 476.158us 1510
aten::native_batch_norm 4.88% 535.154ms 4.88% 535.154ms 312.043us 1715
ResBlock::forward::in_layers() 0.25% 27.942ms 4.61% 505.950ms 1.533ms 330
aten::contiguous 0.49% 53.650ms 4.61% 505.785ms 898.375us 563
aten::group_norm 0.03% 3.086ms 4.60% 504.733ms 534.109us 945
aten::gelu 4.59% 504.271ms 4.59% 504.271ms 2.101ms 240
aten::native_group_norm 0.13% 14.134ms 4.57% 501.647ms 530.843us 945
aten::layer_norm 0.02% 2.493ms 4.35% 477.478ms 620.101us 770
aten::native_layer_norm 0.13% 14.512ms 4.33% 474.985ms 616.864us 770
ResBlock::forward::out_layers() 0.22% 24.224ms 4.01% 440.178ms 1.334ms 330
aten::add 3.98% 436.753ms 3.99% 437.779ms 238.832us 1833
aten::linear 3.47% 380.976ms 3.47% 380.976ms 131.190us 2904
UNetModel::forward::self.middle_block(h, emb, contex... 0.00% 289.000us 3.37% 369.651ms 24.643ms 15
CrossAttention::forward::out rearrange 0.21% 22.869ms 3.30% 362.682ms 755.587us 480
aten::addcmul 1.89% 207.166ms 1.89% 207.247ms 120.844us 1715
aten::softmax 0.03% 2.748ms 1.83% 200.833ms 397.689us 505
aten::_softmax 1.82% 199.655ms 1.82% 199.848ms 395.739us 505
softmax 0.05% 5.145ms 1.79% 196.616ms 409.617us 480
SpatialTransformer::forward::proj_out 0.04% 4.534ms 1.52% 167.299ms 697.079us 240
SpatialTransformer::forward::proj_in 0.02% 2.738ms 1.46% 160.579ms 669.079us 240
model.decode_first_stage 0.03% 3.144ms 1.44% 158.256ms 158.256ms 1
aten::rsqrt 1.30% 142.307ms 1.30% 142.310ms 184.818us 770
CrossAttention::forward::to_k 0.09% 9.489ms 1.26% 138.810ms 289.188us 480
uc = model.get_learned_conditioning() 0.03% 3.545ms 1.05% 115.293ms 115.293ms 1
aten::silu 1.02% 111.950ms 1.02% 111.950ms 109.755us 1020
CrossAttention::forward::out einsum 0.10% 10.554ms 0.99% 109.210ms 227.521us 480
UNetModel::forward::t_emb = timestep_embedding() 0.01% 1.054ms 0.92% 100.805ms 6.720ms 15
aten::as_strided 0.92% 100.602ms 0.92% 100.602ms 9.343us 10768
ResBlock::forward::self.skip_connection(x)+h 0.06% 6.202ms 0.75% 82.065ms 248.682us 330
CrossAttention::forward::to_out 0.24% 26.397ms 0.68% 74.738ms 155.704us 480
SpatialTransformer::forward::norm 0.08% 8.494ms 0.64% 70.619ms 294.246us 240
ResBlock::forward::self.emb_layers() 0.12% 12.787ms 0.64% 70.424ms 213.406us 330
aten::permute 0.11% 12.578ms 0.64% 70.081ms 9.728us 7204
TimestepEmbedSequential::forward::unknown() 0.03% 3.680ms 0.60% 66.191ms 630.390us 105
aten::add_ 0.51% 56.190ms 0.51% 56.190ms 78.042us 720
CrossAttention::forward::to_q 0.10% 10.740ms 0.50% 55.261ms 115.127us 480
aten::cat 0.43% 47.464ms 0.43% 47.464ms 208.175us 228
CrossAttention::forward::to_v 0.08% 8.823ms 0.40% 43.717ms 91.077us 480
sample_heun::eps -0.03% -3287.000us 0.38% 41.211ms 5.151ms 8
aten::sub 0.36% 40.060ms 0.37% 40.121ms 742.981us 54
aten::normal_ 0.32% 35.439ms 0.36% 39.993ms 4.444ms 9
aten::randn_like 0.04% 3.870ms 0.36% 39.868ms 4.984ms 8
c = model.get_learned_conditioning() 0.06% 6.300ms 0.33% 36.278ms 36.278ms 1
aten::unsqueeze 0.05% 5.514ms 0.31% 33.710ms 12.139us 2777
UNetModel::forward::output_block cat() 0.03% 3.643ms 0.28% 30.485ms 169.361us 180
kdiff_eps::c_out, c_in 0.02% 1.825ms 0.27% 29.820ms 1.988ms 15
KCFGDenoiser::deltas 0.01% 683.000us 0.24% 25.863ms 1.724ms 15
UNetModel::forward::self.out(h) 0.03% 3.751ms 0.23% 24.806ms 1.654ms 15
aten::zeros 0.17% 18.999ms 0.22% 23.743ms 2.170us 10940
aten::item 0.02% 2.722ms 0.21% 22.760ms 15.410us 1477
aten::_local_scalar_dense 0.19% 21.333ms 0.20% 21.533ms 14.579us 1477
SpatialTransformer::forward::rearrange 2 0.13% 13.961ms 0.19% 21.260ms 88.583us 240
aten::upsample_nearest2d 0.17% 18.720ms 0.17% 18.720ms 390.000us 48
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment