Created
September 19, 2022 18:38
-
-
Save Birch-san/a940e3d7541787aa7943bb310b146427 to your computer and use it in GitHub Desktop.
8 steps, Heun sampler, stable-diffusion, PyTorch nightly 1.13.0.dev20220917
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ | |
| Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls | |
| ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ | |
| job 0.27% 29.274ms 100.00% 10.976s 10.976s 1 | |
| iteration 0.00% 135.000us 98.68% 10.831s 10.831s 1 | |
| batch 0.92% 101.481ms 98.68% 10.831s 10.831s 1 | |
| kdiff_sample 0.00% 281.000us 93.19% 10.229s 10.229s 1 | |
| sample_heun::loop 0.00% 395.000us 93.17% 10.226s 10.226s 1 | |
| sample_heun::iter 0.01% 1.121ms 93.16% 10.225s 1.278s 8 | |
| KCFGDenoiser::forward 0.03% 3.084ms 92.19% 10.119s 674.578ms 15 | |
| KCFGDenoiser::self.inner_model+split 0.02% 1.693ms 70.75% 7.766s 517.714ms 15 | |
| kdiff_eps::forward 0.01% 1.209ms 70.72% 7.762s 517.500ms 15 | |
| kdiff_eps::get_eps*( 0.02% 2.027ms 70.27% 7.713s 514.180ms 15 | |
| LatentDiffusion::apply_model::self.model() 0.00% 545.000us 69.93% 7.675s 511.684ms 15 | |
| DiffusionWrapper::forward 0.00% 399.000us 69.92% 7.675s 511.645ms 15 | |
| DiffusionWrapper::forward::out = self.diffusion_mode... 0.01% 813.000us 69.90% 7.672s 511.456ms 15 | |
| UNetModel::forward 0.01% 1.520ms 69.89% 7.671s 511.401ms 15 | |
| TimestepEmbedSequential::forward 0.10% 10.778ms 68.16% 7.481s 19.949ms 375 | |
| CheckpointFunction 0.51% 55.749ms 61.85% 6.789s 11.910ms 570 | |
| TimestepEmbedSequential::forward::SpatialTransformer... 0.06% 6.104ms 53.94% 5.920s 24.667ms 240 | |
| SpatialTransformer::forward 0.20% 21.490ms 53.88% 5.914s 24.640ms 240 | |
| sample_heun::denoised=model() 0.00% 448.000us 51.77% 5.682s 710.284ms 8 | |
| SpatialTransformer::forward::for block in transforme... 0.05% 5.264ms 49.50% 5.433s 22.637ms 240 | |
| SpatialTransformer::forward::block() 0.60% 65.724ms 49.45% 5.427s 22.614ms 240 | |
| aten::copy_ 41.44% 4.548s 41.75% 4.583s 965.795us 4745 | |
| sample_heun::solve -0.00% -157.000us 40.68% 4.465s 558.139ms 8 | |
| sample_heun::heun 0.01% 614.000us 40.60% 4.456s 636.607ms 7 | |
| sample_heun::heun::denoised_2 = model() 0.01% 765.000us 40.45% 4.440s 634.294ms 7 | |
| UNetModel::forward::for module in self.output_blocks... 0.02% 2.649ms 39.18% 4.301s 286.708ms 15 | |
| UNetModel::forward::output_block 0.04% 4.505ms 39.15% 4.297s 23.875ms 180 | |
| UNetModel::forward::output_block module() 0.04% 4.582ms 38.83% 4.262s 23.676ms 180 | |
| CrossAttention::forward 0.54% 59.302ms 28.23% 3.099s 6.456ms 480 | |
| UNetModel::forward::for module in self.input_blocks 0.01% 1.492ms 26.09% 2.864s 190.930ms 15 | |
| UNetModel::forward::input_block 0.03% 3.321ms 26.08% 2.862s 15.901ms 180 | |
| UNetModel::forward::input_block module() 0.03% 2.932ms 26.04% 2.858s 15.876ms 180 | |
| aten::to 0.01% 1.319ms 24.17% 2.652s 1.431ms 1854 | |
| aten::_to_copy 0.01% 795.000us 24.16% 2.652s 26.260ms 101 | |
| KCFGDenoiser::weight_tensor 0.02% 1.693ms 20.84% 2.287s 152.457ms 15 | |
| aten::clone 0.58% 63.423ms 17.78% 1.951s 420.181us 4644 | |
| TimestepEmbedSequential::forward::TimestepBlock() 0.50% 55.192ms 13.51% 1.482s 4.492ms 330 | |
| ResBlock::forward 0.21% 23.445ms 12.84% 1.410s 4.272ms 330 | |
| aten::reshape -0.24% -26705.000us 12.83% 1.409s 186.741us 7543 | |
| CrossAttention::forward::qkv rearrange 0.53% 58.112ms 10.31% 1.131s 2.357ms 480 | |
| aten::mul 9.01% 988.592ms 9.08% 996.272ms 1.021ms 976 | |
| CrossAttention::forward::sim einsum 0.21% 23.167ms 8.36% 917.980ms 1.912ms 480 | |
| aten::conv2d -0.09% -9588.000us 7.83% 859.879ms 569.456us 1510 | |
| aten::convolution 0.39% 42.443ms 7.80% 856.271ms 567.067us 1510 | |
| aten::_convolution -0.01% -927.000us 7.76% 851.841ms 564.133us 1510 | |
| aten::einsum 0.20% 21.986ms 7.62% 835.988ms 870.821us 960 | |
| aten::bmm 6.59% 723.458ms 7.08% 777.342ms 769.646us 1010 | |
| aten::_mps_convolution 6.54% 717.311ms 6.55% 718.998ms 476.158us 1510 | |
| aten::native_batch_norm 4.88% 535.154ms 4.88% 535.154ms 312.043us 1715 | |
| ResBlock::forward::in_layers() 0.25% 27.942ms 4.61% 505.950ms 1.533ms 330 | |
| aten::contiguous 0.49% 53.650ms 4.61% 505.785ms 898.375us 563 | |
| aten::group_norm 0.03% 3.086ms 4.60% 504.733ms 534.109us 945 | |
| aten::gelu 4.59% 504.271ms 4.59% 504.271ms 2.101ms 240 | |
| aten::native_group_norm 0.13% 14.134ms 4.57% 501.647ms 530.843us 945 | |
| aten::layer_norm 0.02% 2.493ms 4.35% 477.478ms 620.101us 770 | |
| aten::native_layer_norm 0.13% 14.512ms 4.33% 474.985ms 616.864us 770 | |
| ResBlock::forward::out_layers() 0.22% 24.224ms 4.01% 440.178ms 1.334ms 330 | |
| aten::add 3.98% 436.753ms 3.99% 437.779ms 238.832us 1833 | |
| aten::linear 3.47% 380.976ms 3.47% 380.976ms 131.190us 2904 | |
| UNetModel::forward::self.middle_block(h, emb, contex... 0.00% 289.000us 3.37% 369.651ms 24.643ms 15 | |
| CrossAttention::forward::out rearrange 0.21% 22.869ms 3.30% 362.682ms 755.587us 480 | |
| aten::addcmul 1.89% 207.166ms 1.89% 207.247ms 120.844us 1715 | |
| aten::softmax 0.03% 2.748ms 1.83% 200.833ms 397.689us 505 | |
| aten::_softmax 1.82% 199.655ms 1.82% 199.848ms 395.739us 505 | |
| softmax 0.05% 5.145ms 1.79% 196.616ms 409.617us 480 | |
| SpatialTransformer::forward::proj_out 0.04% 4.534ms 1.52% 167.299ms 697.079us 240 | |
| SpatialTransformer::forward::proj_in 0.02% 2.738ms 1.46% 160.579ms 669.079us 240 | |
| model.decode_first_stage 0.03% 3.144ms 1.44% 158.256ms 158.256ms 1 | |
| aten::rsqrt 1.30% 142.307ms 1.30% 142.310ms 184.818us 770 | |
| CrossAttention::forward::to_k 0.09% 9.489ms 1.26% 138.810ms 289.188us 480 | |
| uc = model.get_learned_conditioning() 0.03% 3.545ms 1.05% 115.293ms 115.293ms 1 | |
| aten::silu 1.02% 111.950ms 1.02% 111.950ms 109.755us 1020 | |
| CrossAttention::forward::out einsum 0.10% 10.554ms 0.99% 109.210ms 227.521us 480 | |
| UNetModel::forward::t_emb = timestep_embedding() 0.01% 1.054ms 0.92% 100.805ms 6.720ms 15 | |
| aten::as_strided 0.92% 100.602ms 0.92% 100.602ms 9.343us 10768 | |
| ResBlock::forward::self.skip_connection(x)+h 0.06% 6.202ms 0.75% 82.065ms 248.682us 330 | |
| CrossAttention::forward::to_out 0.24% 26.397ms 0.68% 74.738ms 155.704us 480 | |
| SpatialTransformer::forward::norm 0.08% 8.494ms 0.64% 70.619ms 294.246us 240 | |
| ResBlock::forward::self.emb_layers() 0.12% 12.787ms 0.64% 70.424ms 213.406us 330 | |
| aten::permute 0.11% 12.578ms 0.64% 70.081ms 9.728us 7204 | |
| TimestepEmbedSequential::forward::unknown() 0.03% 3.680ms 0.60% 66.191ms 630.390us 105 | |
| aten::add_ 0.51% 56.190ms 0.51% 56.190ms 78.042us 720 | |
| CrossAttention::forward::to_q 0.10% 10.740ms 0.50% 55.261ms 115.127us 480 | |
| aten::cat 0.43% 47.464ms 0.43% 47.464ms 208.175us 228 | |
| CrossAttention::forward::to_v 0.08% 8.823ms 0.40% 43.717ms 91.077us 480 | |
| sample_heun::eps -0.03% -3287.000us 0.38% 41.211ms 5.151ms 8 | |
| aten::sub 0.36% 40.060ms 0.37% 40.121ms 742.981us 54 | |
| aten::normal_ 0.32% 35.439ms 0.36% 39.993ms 4.444ms 9 | |
| aten::randn_like 0.04% 3.870ms 0.36% 39.868ms 4.984ms 8 | |
| c = model.get_learned_conditioning() 0.06% 6.300ms 0.33% 36.278ms 36.278ms 1 | |
| aten::unsqueeze 0.05% 5.514ms 0.31% 33.710ms 12.139us 2777 | |
| UNetModel::forward::output_block cat() 0.03% 3.643ms 0.28% 30.485ms 169.361us 180 | |
| kdiff_eps::c_out, c_in 0.02% 1.825ms 0.27% 29.820ms 1.988ms 15 | |
| KCFGDenoiser::deltas 0.01% 683.000us 0.24% 25.863ms 1.724ms 15 | |
| UNetModel::forward::self.out(h) 0.03% 3.751ms 0.23% 24.806ms 1.654ms 15 | |
| aten::zeros 0.17% 18.999ms 0.22% 23.743ms 2.170us 10940 | |
| aten::item 0.02% 2.722ms 0.21% 22.760ms 15.410us 1477 | |
| aten::_local_scalar_dense 0.19% 21.333ms 0.20% 21.533ms 14.579us 1477 | |
| SpatialTransformer::forward::rearrange 2 0.13% 13.961ms 0.19% 21.260ms 88.583us 240 | |
| aten::upsample_nearest2d 0.17% 18.720ms 0.17% 18.720ms 390.000us 48 | |
| ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment