Skip to content

Instantly share code, notes, and snippets.

@Birch-san
Created September 19, 2022 18:37
Show Gist options
  • Save Birch-san/89ef3a7083b543d7d892f3fa080b6110 to your computer and use it in GitHub Desktop.
Save Birch-san/89ef3a7083b543d7d892f3fa080b6110 to your computer and use it in GitHub Desktop.
8 steps, Heun sampler, stable-diffusion, PyTorch stable 1.12.1
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
job 0.34% 34.856ms 99.99% 10.156s 10.156s 1
iteration 0.00% 169.000us 98.56% 10.010s 10.010s 1
batch 1.67% 169.994ms 98.56% 10.010s 10.010s 1
kdiff_sample 0.00% 299.000us 91.96% 9.340s 9.340s 1
sample_heun::loop 0.01% 532.000us 91.94% 9.338s 9.338s 1
sample_heun::iter 0.01% 1.378ms 91.93% 9.337s 1.167s 8
KCFGDenoiser::forward 0.02% 1.949ms 90.48% 9.189s 612.612ms 15
KCFGDenoiser::self.inner_model+split 0.01% 854.000us 81.51% 8.279s 551.911ms 15
kdiff_eps::forward 0.01% 557.000us 81.49% 8.276s 551.742ms 15
kdiff_eps::get_eps*( 0.02% 1.754ms 81.09% 8.235s 549.026ms 15
LatentDiffusion::apply_model::self.model() 0.00% 470.000us 80.73% 8.199s 546.617ms 15
DiffusionWrapper::forward 0.00% 350.000us 80.72% 8.199s 546.583ms 15
DiffusionWrapper::forward::out = self.diffusion_mode... 0.01% 550.000us 80.69% 8.195s 546.356ms 15
UNetModel::forward 0.01% 1.093ms 80.69% 8.195s 546.317ms 15
TimestepEmbedSequential::forward 0.08% 8.215ms 79.45% 8.069s 21.517ms 375
CheckpointFunction 0.75% 75.843ms 70.45% 7.155s 12.552ms 570
TimestepEmbedSequential::forward::SpatialTransformer... 0.05% 5.250ms 67.06% 6.811s 28.380ms 240
SpatialTransformer::forward 0.14% 14.411ms 67.01% 6.805s 28.356ms 240
SpatialTransformer::forward::for block in transforme... 0.04% 4.205ms 60.28% 6.122s 25.509ms 240
SpatialTransformer::forward::block() 0.61% 61.958ms 60.23% 6.117s 25.489ms 240
sample_heun::denoised=model() 0.00% 438.000us 52.22% 5.304s 662.951ms 8
UNetModel::forward::for module in self.output_blocks... 0.02% 2.062ms 44.58% 4.528s 301.854ms 15
UNetModel::forward::output_block 0.04% 3.672ms 44.56% 4.525s 25.140ms 180
UNetModel::forward::output_block module() 0.04% 3.584ms 44.16% 4.485s 24.917ms 180
sample_heun::solve 0.00% 504.000us 38.61% 3.921s 490.161ms 8
sample_heun::heun 0.01% 527.000us 38.52% 3.912s 558.871ms 7
sample_heun::heun::denoised_2 = model() 0.00% 339.000us 38.32% 3.892s 556.057ms 7
CrossAttention::forward 0.42% 42.825ms 34.43% 3.496s 7.284ms 480
UNetModel::forward::for module in self.input_blocks 0.01% 1.321ms 31.89% 3.239s 215.958ms 15
UNetModel::forward::input_block 0.03% 3.063ms 31.88% 3.238s 17.988ms 180
UNetModel::forward::input_block module() 0.03% 2.709ms 31.83% 3.233s 17.961ms 180
aten::copy_ 25.87% 2.627s 26.18% 2.659s 996.910us 2667
aten::clone 0.04% 4.411ms 16.94% 1.720s 672.283us 2559
aten::reshape 0.10% 9.776ms 14.86% 1.509s 222.821us 6773
aten::linear 0.03% 2.653ms 13.88% 1.410s 485.523us 2904
aten::_mps_linear 13.86% 1.407s 13.86% 1.407s 484.609us 2904
CrossAttention::forward::qkv rearrange 0.40% 40.768ms 13.69% 1.390s 2.897ms 480
aten::layer_norm 0.03% 2.649ms 13.36% 1.357s 1.762ms 770
aten::native_layer_norm 0.11% 10.760ms 13.33% 1.354s 1.759ms 770
TimestepEmbedSequential::forward::TimestepBlock() 0.52% 53.267ms 11.35% 1.153s 3.493ms 330
aten::add 10.76% 1.093s 10.78% 1.094s 598.702us 1828
ResBlock::forward 0.17% 16.847ms 10.69% 1.085s 3.289ms 330
aten::to 0.01% 652.000us 9.35% 949.398ms 510.704us 1859
aten::_to_copy 0.00% 343.000us 9.34% 948.746ms 8.785ms 108
KCFGDenoiser::weight_tensor 0.01% 1.105ms 8.36% 849.556ms 56.637ms 15
CrossAttention::forward::to_k 0.09% 8.716ms 6.69% 679.089ms 1.415ms 480
aten::addcmul 6.25% 635.265ms 6.26% 635.825ms 370.743us 1715
aten::mul 5.95% 604.809ms 6.07% 616.263ms 631.417us 976
aten::einsum 0.13% 13.710ms 4.85% 493.006ms 513.548us 960
aten::conv2d 0.03% 3.254ms 4.78% 485.469ms 321.503us 1510
aten::convolution 0.04% 3.973ms 4.75% 482.215ms 319.348us 1510
aten::_convolution 0.07% 7.370ms 4.71% 478.242ms 316.717us 1510
aten::group_norm 0.04% 3.829ms 4.53% 460.392ms 487.187us 945
aten::native_group_norm 0.08% 7.814ms 4.50% 456.563ms 483.135us 945
aten::bmm 4.22% 428.243ms 4.41% 447.769ms 443.336us 1010
aten::native_batch_norm 4.21% 427.565ms 4.21% 427.565ms 249.309us 1715
aten::_mps_convolution 3.99% 405.697ms 4.02% 408.608ms 270.601us 1510
aten::rsqrt 3.78% 383.453ms 3.78% 383.612ms 498.197us 770
CrossAttention::forward::sim einsum 0.15% 14.899ms 3.71% 376.530ms 784.438us 480
UNetModel::forward::self.middle_block(h, emb, contex... 0.00% 295.000us 3.53% 358.161ms 23.877ms 15
ResBlock::forward::in_layers() 0.25% 25.276ms 3.36% 340.796ms 1.033ms 330
model.decode_first_stage 0.05% 5.222ms 3.29% 333.945ms 333.945ms 1
aten::add_ 2.40% 243.345ms 2.40% 243.345ms 337.979us 720
CrossAttention::forward::out einsum 0.08% 8.101ms 2.35% 238.709ms 497.310us 480
aten::contiguous 0.01% 783.000us 2.25% 228.643ms 401.128us 570
aten::gelu 2.13% 216.640ms 2.13% 216.640ms 902.667us 240
ResBlock::forward::out_layers() 0.26% 25.977ms 1.96% 199.267ms 603.839us 330
CrossAttention::forward::to_q 0.09% 9.273ms 1.91% 194.296ms 404.783us 480
CrossAttention::forward::to_v 0.08% 8.436ms 1.80% 182.472ms 380.150us 480
aten::silu 1.78% 181.032ms 1.78% 181.032ms 177.482us 1020
CrossAttention::forward::out rearrange 0.15% 14.733ms 1.76% 178.668ms 372.225us 480
ResBlock::forward::self.emb_layers() 0.11% 11.428ms 1.11% 112.718ms 341.570us 330
aten::softmax 0.01% 969.000us 1.11% 112.248ms 222.273us 505
aten::_softmax 1.10% 111.279ms 1.10% 111.279ms 220.354us 505
uc = model.get_learned_conditioning() 0.09% 9.557ms 1.09% 110.818ms 110.818ms 1
softmax 0.05% 4.896ms 1.05% 106.479ms 221.831us 480
SpatialTransformer::forward::norm 0.07% 6.647ms 1.01% 103.038ms 429.325us 240
SpatialTransformer::forward::proj_out 0.07% 7.052ms 0.97% 98.099ms 408.746us 240
CrossAttention::forward::to_out 0.20% 19.863ms 0.95% 96.928ms 201.933us 480
ResBlock::forward::self.skip_connection(x)+h 0.07% 7.453ms 0.95% 96.814ms 293.376us 330
TimestepEmbedSequential::forward::unknown() 0.07% 6.737ms 0.94% 95.058ms 905.314us 105
aten::as_strided 0.92% 92.940ms 0.92% 92.940ms 8.633us 10766
SpatialTransformer::forward::proj_in 0.05% 5.486ms 0.75% 75.928ms 316.367us 240
sample_heun::eps 0.00% 388.000us 0.72% 73.531ms 9.191ms 8
aten::normal_ 0.36% 36.695ms 0.71% 72.344ms 8.038ms 9
aten::randn_like 0.00% 37.000us 0.71% 72.192ms 9.024ms 8
aten::permute 0.10% 10.209ms 0.62% 62.518ms 8.678us 7204
aten::item 0.01% 1.519ms 0.57% 57.704ms 36.522us 1580
aten::_local_scalar_dense 0.55% 55.816ms 0.55% 56.185ms 35.560us 1580
aten::cat 0.55% 55.777ms 0.55% 55.777ms 244.636us 228
c = model.get_learned_conditioning() 0.08% 8.624ms 0.48% 48.806ms 48.806ms 1
aten::sigmoid 0.40% 40.140ms 0.40% 40.140ms 757.358us 53
aten::sub 0.36% 36.376ms 0.39% 39.848ms 737.926us 54
UNetModel::forward::output_block cat() 0.03% 3.061ms 0.35% 35.633ms 197.961us 180
aten::unsqueeze 0.04% 4.109ms 0.31% 31.227ms 11.245us 2777
kdiff_eps::c_out, c_in 0.02% 2.108ms 0.29% 29.484ms 1.966ms 15
UNetModel::forward::t_emb = timestep_embedding() 0.02% 1.587ms 0.29% 29.232ms 1.949ms 15
UNetModel::forward::self.out(h) 0.02% 1.731ms 0.27% 27.464ms 1.831ms 15
aten::zeros 0.15% 15.124ms 0.24% 24.699ms 2.258us 10940
KCFGDenoiser::deltas 0.00% 341.000us 0.21% 21.211ms 1.414ms 15
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment