Created
September 19, 2022 18:37
-
-
Save Birch-san/89ef3a7083b543d7d892f3fa080b6110 to your computer and use it in GitHub Desktop.
8 steps, Heun sampler, stable-diffusion, PyTorch stable 1.12.1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ | |
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls | |
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ | |
job 0.34% 34.856ms 99.99% 10.156s 10.156s 1 | |
iteration 0.00% 169.000us 98.56% 10.010s 10.010s 1 | |
batch 1.67% 169.994ms 98.56% 10.010s 10.010s 1 | |
kdiff_sample 0.00% 299.000us 91.96% 9.340s 9.340s 1 | |
sample_heun::loop 0.01% 532.000us 91.94% 9.338s 9.338s 1 | |
sample_heun::iter 0.01% 1.378ms 91.93% 9.337s 1.167s 8 | |
KCFGDenoiser::forward 0.02% 1.949ms 90.48% 9.189s 612.612ms 15 | |
KCFGDenoiser::self.inner_model+split 0.01% 854.000us 81.51% 8.279s 551.911ms 15 | |
kdiff_eps::forward 0.01% 557.000us 81.49% 8.276s 551.742ms 15 | |
kdiff_eps::get_eps*( 0.02% 1.754ms 81.09% 8.235s 549.026ms 15 | |
LatentDiffusion::apply_model::self.model() 0.00% 470.000us 80.73% 8.199s 546.617ms 15 | |
DiffusionWrapper::forward 0.00% 350.000us 80.72% 8.199s 546.583ms 15 | |
DiffusionWrapper::forward::out = self.diffusion_mode... 0.01% 550.000us 80.69% 8.195s 546.356ms 15 | |
UNetModel::forward 0.01% 1.093ms 80.69% 8.195s 546.317ms 15 | |
TimestepEmbedSequential::forward 0.08% 8.215ms 79.45% 8.069s 21.517ms 375 | |
CheckpointFunction 0.75% 75.843ms 70.45% 7.155s 12.552ms 570 | |
TimestepEmbedSequential::forward::SpatialTransformer... 0.05% 5.250ms 67.06% 6.811s 28.380ms 240 | |
SpatialTransformer::forward 0.14% 14.411ms 67.01% 6.805s 28.356ms 240 | |
SpatialTransformer::forward::for block in transforme... 0.04% 4.205ms 60.28% 6.122s 25.509ms 240 | |
SpatialTransformer::forward::block() 0.61% 61.958ms 60.23% 6.117s 25.489ms 240 | |
sample_heun::denoised=model() 0.00% 438.000us 52.22% 5.304s 662.951ms 8 | |
UNetModel::forward::for module in self.output_blocks... 0.02% 2.062ms 44.58% 4.528s 301.854ms 15 | |
UNetModel::forward::output_block 0.04% 3.672ms 44.56% 4.525s 25.140ms 180 | |
UNetModel::forward::output_block module() 0.04% 3.584ms 44.16% 4.485s 24.917ms 180 | |
sample_heun::solve 0.00% 504.000us 38.61% 3.921s 490.161ms 8 | |
sample_heun::heun 0.01% 527.000us 38.52% 3.912s 558.871ms 7 | |
sample_heun::heun::denoised_2 = model() 0.00% 339.000us 38.32% 3.892s 556.057ms 7 | |
CrossAttention::forward 0.42% 42.825ms 34.43% 3.496s 7.284ms 480 | |
UNetModel::forward::for module in self.input_blocks 0.01% 1.321ms 31.89% 3.239s 215.958ms 15 | |
UNetModel::forward::input_block 0.03% 3.063ms 31.88% 3.238s 17.988ms 180 | |
UNetModel::forward::input_block module() 0.03% 2.709ms 31.83% 3.233s 17.961ms 180 | |
aten::copy_ 25.87% 2.627s 26.18% 2.659s 996.910us 2667 | |
aten::clone 0.04% 4.411ms 16.94% 1.720s 672.283us 2559 | |
aten::reshape 0.10% 9.776ms 14.86% 1.509s 222.821us 6773 | |
aten::linear 0.03% 2.653ms 13.88% 1.410s 485.523us 2904 | |
aten::_mps_linear 13.86% 1.407s 13.86% 1.407s 484.609us 2904 | |
CrossAttention::forward::qkv rearrange 0.40% 40.768ms 13.69% 1.390s 2.897ms 480 | |
aten::layer_norm 0.03% 2.649ms 13.36% 1.357s 1.762ms 770 | |
aten::native_layer_norm 0.11% 10.760ms 13.33% 1.354s 1.759ms 770 | |
TimestepEmbedSequential::forward::TimestepBlock() 0.52% 53.267ms 11.35% 1.153s 3.493ms 330 | |
aten::add 10.76% 1.093s 10.78% 1.094s 598.702us 1828 | |
ResBlock::forward 0.17% 16.847ms 10.69% 1.085s 3.289ms 330 | |
aten::to 0.01% 652.000us 9.35% 949.398ms 510.704us 1859 | |
aten::_to_copy 0.00% 343.000us 9.34% 948.746ms 8.785ms 108 | |
KCFGDenoiser::weight_tensor 0.01% 1.105ms 8.36% 849.556ms 56.637ms 15 | |
CrossAttention::forward::to_k 0.09% 8.716ms 6.69% 679.089ms 1.415ms 480 | |
aten::addcmul 6.25% 635.265ms 6.26% 635.825ms 370.743us 1715 | |
aten::mul 5.95% 604.809ms 6.07% 616.263ms 631.417us 976 | |
aten::einsum 0.13% 13.710ms 4.85% 493.006ms 513.548us 960 | |
aten::conv2d 0.03% 3.254ms 4.78% 485.469ms 321.503us 1510 | |
aten::convolution 0.04% 3.973ms 4.75% 482.215ms 319.348us 1510 | |
aten::_convolution 0.07% 7.370ms 4.71% 478.242ms 316.717us 1510 | |
aten::group_norm 0.04% 3.829ms 4.53% 460.392ms 487.187us 945 | |
aten::native_group_norm 0.08% 7.814ms 4.50% 456.563ms 483.135us 945 | |
aten::bmm 4.22% 428.243ms 4.41% 447.769ms 443.336us 1010 | |
aten::native_batch_norm 4.21% 427.565ms 4.21% 427.565ms 249.309us 1715 | |
aten::_mps_convolution 3.99% 405.697ms 4.02% 408.608ms 270.601us 1510 | |
aten::rsqrt 3.78% 383.453ms 3.78% 383.612ms 498.197us 770 | |
CrossAttention::forward::sim einsum 0.15% 14.899ms 3.71% 376.530ms 784.438us 480 | |
UNetModel::forward::self.middle_block(h, emb, contex... 0.00% 295.000us 3.53% 358.161ms 23.877ms 15 | |
ResBlock::forward::in_layers() 0.25% 25.276ms 3.36% 340.796ms 1.033ms 330 | |
model.decode_first_stage 0.05% 5.222ms 3.29% 333.945ms 333.945ms 1 | |
aten::add_ 2.40% 243.345ms 2.40% 243.345ms 337.979us 720 | |
CrossAttention::forward::out einsum 0.08% 8.101ms 2.35% 238.709ms 497.310us 480 | |
aten::contiguous 0.01% 783.000us 2.25% 228.643ms 401.128us 570 | |
aten::gelu 2.13% 216.640ms 2.13% 216.640ms 902.667us 240 | |
ResBlock::forward::out_layers() 0.26% 25.977ms 1.96% 199.267ms 603.839us 330 | |
CrossAttention::forward::to_q 0.09% 9.273ms 1.91% 194.296ms 404.783us 480 | |
CrossAttention::forward::to_v 0.08% 8.436ms 1.80% 182.472ms 380.150us 480 | |
aten::silu 1.78% 181.032ms 1.78% 181.032ms 177.482us 1020 | |
CrossAttention::forward::out rearrange 0.15% 14.733ms 1.76% 178.668ms 372.225us 480 | |
ResBlock::forward::self.emb_layers() 0.11% 11.428ms 1.11% 112.718ms 341.570us 330 | |
aten::softmax 0.01% 969.000us 1.11% 112.248ms 222.273us 505 | |
aten::_softmax 1.10% 111.279ms 1.10% 111.279ms 220.354us 505 | |
uc = model.get_learned_conditioning() 0.09% 9.557ms 1.09% 110.818ms 110.818ms 1 | |
softmax 0.05% 4.896ms 1.05% 106.479ms 221.831us 480 | |
SpatialTransformer::forward::norm 0.07% 6.647ms 1.01% 103.038ms 429.325us 240 | |
SpatialTransformer::forward::proj_out 0.07% 7.052ms 0.97% 98.099ms 408.746us 240 | |
CrossAttention::forward::to_out 0.20% 19.863ms 0.95% 96.928ms 201.933us 480 | |
ResBlock::forward::self.skip_connection(x)+h 0.07% 7.453ms 0.95% 96.814ms 293.376us 330 | |
TimestepEmbedSequential::forward::unknown() 0.07% 6.737ms 0.94% 95.058ms 905.314us 105 | |
aten::as_strided 0.92% 92.940ms 0.92% 92.940ms 8.633us 10766 | |
SpatialTransformer::forward::proj_in 0.05% 5.486ms 0.75% 75.928ms 316.367us 240 | |
sample_heun::eps 0.00% 388.000us 0.72% 73.531ms 9.191ms 8 | |
aten::normal_ 0.36% 36.695ms 0.71% 72.344ms 8.038ms 9 | |
aten::randn_like 0.00% 37.000us 0.71% 72.192ms 9.024ms 8 | |
aten::permute 0.10% 10.209ms 0.62% 62.518ms 8.678us 7204 | |
aten::item 0.01% 1.519ms 0.57% 57.704ms 36.522us 1580 | |
aten::_local_scalar_dense 0.55% 55.816ms 0.55% 56.185ms 35.560us 1580 | |
aten::cat 0.55% 55.777ms 0.55% 55.777ms 244.636us 228 | |
c = model.get_learned_conditioning() 0.08% 8.624ms 0.48% 48.806ms 48.806ms 1 | |
aten::sigmoid 0.40% 40.140ms 0.40% 40.140ms 757.358us 53 | |
aten::sub 0.36% 36.376ms 0.39% 39.848ms 737.926us 54 | |
UNetModel::forward::output_block cat() 0.03% 3.061ms 0.35% 35.633ms 197.961us 180 | |
aten::unsqueeze 0.04% 4.109ms 0.31% 31.227ms 11.245us 2777 | |
kdiff_eps::c_out, c_in 0.02% 2.108ms 0.29% 29.484ms 1.966ms 15 | |
UNetModel::forward::t_emb = timestep_embedding() 0.02% 1.587ms 0.29% 29.232ms 1.949ms 15 | |
UNetModel::forward::self.out(h) 0.02% 1.731ms 0.27% 27.464ms 1.831ms 15 | |
aten::zeros 0.15% 15.124ms 0.24% 24.699ms 2.258us 10940 | |
KCFGDenoiser::deltas 0.00% 341.000us 0.21% 21.211ms 1.414ms 15 | |
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment