Created
August 21, 2024 01:38
-
-
Save hvaara/200221ab0fbf28d2ce3e6e656ed0dd76 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| High watermark memory allocation limit: 163.20 GB | |
| Low watermark memory allocation limit: 134.40 GB | |
| Initializing private heap allocator on unified device memory of size 96.00 GB | |
| BlitCopySync: CPU:Float[3, 224, 224] --> MPS(buf#1:1):Float[3, 224, 224] (len=588.00 KB, gpu=7.584 ms, cpu=4.728 ms) | |
| BlitCopySync: CPU:Float[64, 3, 7, 7] --> MPS(buf#2:1):Float[64, 3, 7, 7] (len=36.75 KB, gpu=0.271 ms, cpu=0.048 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#3:1):Float[64] (len=256 bytes, gpu=0.270 ms, cpu=0.035 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#4:1):Float[64] (len=256 bytes, gpu=9.766 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#5:1):Float[64] (len=256 bytes, gpu=0.044 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#6:1):Float[64] (len=256 bytes, gpu=0.583 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#7:1):Long[] (len=8 bytes, gpu=0.430 ms, cpu=0.234 ms) | |
| BlitCopySync: CPU:Float[64, 64, 1, 1] --> MPS(buf#8:1):Float[64, 64, 1, 1] (len=16.00 KB, gpu=0.732 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#9:1):Float[64] (len=256 bytes, gpu=0.713 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#10:1):Float[64] (len=256 bytes, gpu=0.688 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#11:1):Float[64] (len=256 bytes, gpu=9.616 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#12:1):Float[64] (len=256 bytes, gpu=0.536 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#13:1):Long[] (len=8 bytes, gpu=0.586 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[64, 64, 3, 3] --> MPS(buf#14:1):Float[64, 64, 3, 3] (len=144.00 KB, gpu=0.724 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#15:1):Float[64] (len=256 bytes, gpu=0.740 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#16:1):Float[64] (len=256 bytes, gpu=0.742 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#17:1):Float[64] (len=256 bytes, gpu=0.779 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#18:1):Float[64] (len=256 bytes, gpu=7.768 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#19:1):Long[] (len=8 bytes, gpu=9.432 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[256, 64, 1, 1] --> MPS(buf#20:1):Float[256, 64, 1, 1] (len=64.00 KB, gpu=0.541 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#21:1):Float[256] (len=1024 bytes, gpu=0.661 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#22:1):Float[256] (len=1024 bytes, gpu=0.510 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#23:1):Float[256] (len=1024 bytes, gpu=0.408 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#24:1):Float[256] (len=1024 bytes, gpu=0.702 ms, cpu=0.041 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#25:1):Long[] (len=8 bytes, gpu=0.674 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[256, 64, 1, 1] --> MPS(buf#26:1):Float[256, 64, 1, 1] (len=64.00 KB, gpu=7.645 ms, cpu=0.033 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#27:1):Float[256] (len=1024 bytes, gpu=9.384 ms, cpu=0.034 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#28:1):Float[256] (len=1024 bytes, gpu=0.297 ms, cpu=0.043 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#29:1):Float[256] (len=1024 bytes, gpu=0.725 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#30:1):Float[256] (len=1024 bytes, gpu=0.722 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#31:1):Long[] (len=8 bytes, gpu=0.511 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[64, 256, 1, 1] --> MPS(buf#32:1):Float[64, 256, 1, 1] (len=64.00 KB, gpu=0.710 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#33:1):Float[64] (len=256 bytes, gpu=0.734 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#34:1):Float[64] (len=256 bytes, gpu=7.710 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#35:1):Float[64] (len=256 bytes, gpu=9.495 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#36:1):Float[64] (len=256 bytes, gpu=0.136 ms, cpu=0.037 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#37:1):Long[] (len=8 bytes, gpu=0.559 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[64, 64, 3, 3] --> MPS(buf#38:1):Float[64, 64, 3, 3] (len=144.00 KB, gpu=0.546 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#39:1):Float[64] (len=256 bytes, gpu=0.514 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#40:1):Float[64] (len=256 bytes, gpu=0.598 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#41:1):Float[64] (len=256 bytes, gpu=0.460 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#42:1):Float[64] (len=256 bytes, gpu=7.353 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#43:1):Long[] (len=8 bytes, gpu=7.501 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[256, 64, 1, 1] --> MPS(buf#44:1):Float[256, 64, 1, 1] (len=64.00 KB, gpu=6.985 ms, cpu=0.031 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#45:1):Float[256] (len=1024 bytes, gpu=1.171 ms, cpu=0.039 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#46:1):Float[256] (len=1024 bytes, gpu=0.058 ms, cpu=0.051 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#47:1):Float[256] (len=1024 bytes, gpu=0.543 ms, cpu=0.045 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#48:1):Float[256] (len=1024 bytes, gpu=0.572 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#49:1):Long[] (len=8 bytes, gpu=0.530 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[64, 256, 1, 1] --> MPS(buf#50:1):Float[64, 256, 1, 1] (len=64.00 KB, gpu=0.584 ms, cpu=0.031 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#51:1):Float[64] (len=256 bytes, gpu=0.633 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#52:1):Float[64] (len=256 bytes, gpu=9.625 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#53:1):Float[64] (len=256 bytes, gpu=0.364 ms, cpu=0.087 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#54:1):Float[64] (len=256 bytes, gpu=0.611 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#55:1):Long[] (len=8 bytes, gpu=0.583 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[64, 64, 3, 3] --> MPS(buf#56:1):Float[64, 64, 3, 3] (len=144.00 KB, gpu=0.566 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#57:1):Float[64] (len=256 bytes, gpu=0.541 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#58:1):Float[64] (len=256 bytes, gpu=0.596 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#59:1):Float[64] (len=256 bytes, gpu=1.590 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#60:1):Float[64] (len=256 bytes, gpu=0.380 ms, cpu=0.043 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#61:1):Long[] (len=8 bytes, gpu=0.456 ms, cpu=0.052 ms) | |
| BlitCopySync: CPU:Float[256, 64, 1, 1] --> MPS(buf#62:1):Float[256, 64, 1, 1] (len=64.00 KB, gpu=0.009 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#63:1):Float[256] (len=1024 bytes, gpu=0.421 ms, cpu=0.033 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#64:1):Float[256] (len=1024 bytes, gpu=0.482 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#65:1):Float[256] (len=1024 bytes, gpu=0.544 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#66:1):Float[256] (len=1024 bytes, gpu=2.552 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#67:1):Long[] (len=8 bytes, gpu=0.356 ms, cpu=0.111 ms) | |
| BlitCopySync: CPU:Float[128, 256, 1, 1] --> MPS(buf#68:1):Float[128, 256, 1, 1] (len=128.00 KB, gpu=0.477 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#69:1):Float[128] (len=512 bytes, gpu=0.536 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#70:1):Float[128] (len=512 bytes, gpu=0.404 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#71:1):Float[128] (len=512 bytes, gpu=0.510 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#72:1):Float[128] (len=512 bytes, gpu=1.487 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#73:1):Long[] (len=8 bytes, gpu=0.370 ms, cpu=0.163 ms) | |
| BlitCopySync: CPU:Float[128, 128, 3, 3] --> MPS(buf#74:1):Float[128, 128, 3, 3] (len=576.00 KB, gpu=0.508 ms, cpu=0.041 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#75:1):Float[128] (len=512 bytes, gpu=0.452 ms, cpu=0.033 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#76:1):Float[128] (len=512 bytes, gpu=0.507 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#77:1):Float[128] (len=512 bytes, gpu=0.507 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#78:1):Float[128] (len=512 bytes, gpu=0.588 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#79:1):Long[] (len=8 bytes, gpu=6.547 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[512, 128, 1, 1] --> MPS(buf#80:1):Float[512, 128, 1, 1] (len=256.00 KB, gpu=0.216 ms, cpu=0.044 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#81:1):Float[512] (len=2.00 KB, gpu=7.525 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#82:1):Float[512] (len=2.00 KB, gpu=6.355 ms, cpu=0.081 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#83:1):Float[512] (len=2.00 KB, gpu=0.265 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#84:1):Float[512] (len=2.00 KB, gpu=6.700 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#85:1):Long[] (len=8 bytes, gpu=0.319 ms, cpu=0.035 ms) | |
| BlitCopySync: CPU:Float[512, 256, 1, 1] --> MPS(buf#86:1):Float[512, 256, 1, 1] (len=512.00 KB, gpu=2.752 ms, cpu=0.042 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#87:1):Float[512] (len=2.00 KB, gpu=0.863 ms, cpu=0.058 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#88:1):Float[512] (len=2.00 KB, gpu=0.491 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#89:1):Float[512] (len=2.00 KB, gpu=0.580 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#90:1):Float[512] (len=2.00 KB, gpu=0.582 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#91:1):Long[] (len=8 bytes, gpu=6.442 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[128, 512, 1, 1] --> MPS(buf#92:1):Float[128, 512, 1, 1] (len=256.00 KB, gpu=0.300 ms, cpu=0.068 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#93:1):Float[128] (len=512 bytes, gpu=6.501 ms, cpu=0.083 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#94:1):Float[128] (len=512 bytes, gpu=0.231 ms, cpu=0.043 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#95:1):Float[128] (len=512 bytes, gpu=7.495 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#96:1):Float[128] (len=512 bytes, gpu=6.400 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#97:1):Long[] (len=8 bytes, gpu=0.474 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[128, 128, 3, 3] --> MPS(buf#98:1):Float[128, 128, 3, 3] (len=576.00 KB, gpu=1.376 ms, cpu=0.070 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#99:1):Float[128] (len=512 bytes, gpu=0.374 ms, cpu=0.054 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#100:1):Float[128] (len=512 bytes, gpu=0.537 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#101:1):Float[128] (len=512 bytes, gpu=0.488 ms, cpu=0.033 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#102:1):Float[128] (len=512 bytes, gpu=0.736 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#103:1):Long[] (len=8 bytes, gpu=0.784 ms, cpu=0.031 ms) | |
| BlitCopySync: CPU:Float[512, 128, 1, 1] --> MPS(buf#104:1):Float[512, 128, 1, 1] (len=256.00 KB, gpu=0.511 ms, cpu=0.043 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#105:1):Float[512] (len=2.00 KB, gpu=7.668 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#106:1):Float[512] (len=2.00 KB, gpu=7.167 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#107:1):Float[512] (len=2.00 KB, gpu=10.353 ms, cpu=0.040 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#108:1):Float[512] (len=2.00 KB, gpu=0.144 ms, cpu=0.130 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#109:1):Long[] (len=8 bytes, gpu=0.721 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[128, 512, 1, 1] --> MPS(buf#110:1):Float[128, 512, 1, 1] (len=256.00 KB, gpu=0.726 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#111:1):Float[128] (len=512 bytes, gpu=0.388 ms, cpu=0.088 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#112:1):Float[128] (len=512 bytes, gpu=0.737 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#113:1):Float[128] (len=512 bytes, gpu=7.764 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#114:1):Float[128] (len=512 bytes, gpu=7.336 ms, cpu=0.040 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#115:1):Long[] (len=8 bytes, gpu=9.434 ms, cpu=0.061 ms) | |
| BlitCopySync: CPU:Float[128, 128, 3, 3] --> MPS(buf#116:1):Float[128, 128, 3, 3] (len=576.00 KB, gpu=0.373 ms, cpu=0.053 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#117:1):Float[128] (len=512 bytes, gpu=0.706 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#118:1):Float[128] (len=512 bytes, gpu=0.610 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#119:1):Float[128] (len=512 bytes, gpu=0.687 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#120:1):Float[128] (len=512 bytes, gpu=0.725 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#121:1):Long[] (len=8 bytes, gpu=0.576 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[512, 128, 1, 1] --> MPS(buf#122:1):Float[512, 128, 1, 1] (len=256.00 KB, gpu=7.689 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#123:1):Float[512] (len=2.00 KB, gpu=5.340 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#124:1):Float[512] (len=2.00 KB, gpu=0.480 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#125:1):Float[512] (len=2.00 KB, gpu=0.683 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#126:1):Float[512] (len=2.00 KB, gpu=1.668 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#127:1):Long[] (len=8 bytes, gpu=0.650 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[128, 512, 1, 1] --> MPS(buf#128:1):Float[128, 512, 1, 1] (len=256.00 KB, gpu=0.568 ms, cpu=0.050 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#129:1):Float[128] (len=512 bytes, gpu=0.665 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#130:1):Float[128] (len=512 bytes, gpu=0.616 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#131:1):Float[128] (len=512 bytes, gpu=0.692 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#132:1):Float[128] (len=512 bytes, gpu=0.712 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#133:1):Long[] (len=8 bytes, gpu=1.727 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[128, 128, 3, 3] --> MPS(buf#134:1):Float[128, 128, 3, 3] (len=576.00 KB, gpu=0.603 ms, cpu=0.059 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#135:1):Float[128] (len=512 bytes, gpu=0.752 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#136:1):Float[128] (len=512 bytes, gpu=0.543 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#137:1):Float[128] (len=512 bytes, gpu=0.649 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#138:1):Float[128] (len=512 bytes, gpu=0.564 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#139:1):Long[] (len=8 bytes, gpu=0.673 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[512, 128, 1, 1] --> MPS(buf#140:1):Float[512, 128, 1, 1] (len=256.00 KB, gpu=2.707 ms, cpu=0.031 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#141:1):Float[512] (len=2.00 KB, gpu=0.450 ms, cpu=0.033 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#142:1):Float[512] (len=2.00 KB, gpu=0.688 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#143:1):Float[512] (len=2.00 KB, gpu=0.542 ms, cpu=0.040 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#144:1):Float[512] (len=2.00 KB, gpu=0.541 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#145:1):Long[] (len=8 bytes, gpu=0.755 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[256, 512, 1, 1] --> MPS(buf#146:1):Float[256, 512, 1, 1] (len=512.00 KB, gpu=6.655 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#147:1):Float[256] (len=1024 bytes, gpu=0.297 ms, cpu=0.047 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#148:1):Float[256] (len=1024 bytes, gpu=5.738 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#149:1):Float[256] (len=1024 bytes, gpu=0.272 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#150:1):Float[256] (len=1024 bytes, gpu=0.713 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#151:1):Long[] (len=8 bytes, gpu=9.696 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#152:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.957 ms, cpu=1.334 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#153:1):Float[256] (len=1024 bytes, gpu=0.506 ms, cpu=0.040 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#154:1):Float[256] (len=1024 bytes, gpu=0.522 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#155:1):Float[256] (len=1024 bytes, gpu=0.599 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#156:1):Float[256] (len=1024 bytes, gpu=10.719 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#157:1):Long[] (len=8 bytes, gpu=0.519 ms, cpu=0.040 ms) | |
| BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#158:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=0.454 ms, cpu=0.103 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#159:1):Float[1024] (len=4.00 KB, gpu=0.558 ms, cpu=0.039 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#160:1):Float[1024] (len=4.00 KB, gpu=0.497 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#161:1):Float[1024] (len=4.00 KB, gpu=0.702 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#162:1):Float[1024] (len=4.00 KB, gpu=7.608 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#163:1):Long[] (len=8 bytes, gpu=7.110 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[1024, 512, 1, 1] --> MPS(buf#164:1):Float[1024, 512, 1, 1] (len=2.00 MB, gpu=9.082 ms, cpu=0.144 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#165:1):Float[1024] (len=4.00 KB, gpu=0.328 ms, cpu=0.035 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#166:1):Float[1024] (len=4.00 KB, gpu=0.415 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#167:1):Float[1024] (len=4.00 KB, gpu=0.743 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#168:1):Float[1024] (len=4.00 KB, gpu=0.372 ms, cpu=0.060 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#169:1):Long[] (len=8 bytes, gpu=0.533 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#170:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=0.276 ms, cpu=0.370 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#171:1):Float[256] (len=1024 bytes, gpu=6.715 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#172:1):Float[256] (len=1024 bytes, gpu=0.636 ms, cpu=0.049 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#173:1):Float[256] (len=1024 bytes, gpu=9.351 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#174:1):Float[256] (len=1024 bytes, gpu=0.250 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#175:1):Long[] (len=8 bytes, gpu=0.735 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#176:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.596 ms, cpu=0.115 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#177:1):Float[256] (len=1024 bytes, gpu=0.590 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#178:1):Float[256] (len=1024 bytes, gpu=0.554 ms, cpu=0.082 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#179:1):Float[256] (len=1024 bytes, gpu=0.742 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#180:1):Float[256] (len=1024 bytes, gpu=7.678 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#181:1):Long[] (len=8 bytes, gpu=9.545 ms, cpu=0.083 ms) | |
| BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#182:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=0.490 ms, cpu=0.079 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#183:1):Float[1024] (len=4.00 KB, gpu=0.635 ms, cpu=0.033 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#184:1):Float[1024] (len=4.00 KB, gpu=0.713 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#185:1):Float[1024] (len=4.00 KB, gpu=0.516 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#186:1):Float[1024] (len=4.00 KB, gpu=0.705 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#187:1):Long[] (len=8 bytes, gpu=0.668 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#188:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=7.654 ms, cpu=0.077 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#189:1):Float[256] (len=1024 bytes, gpu=5.342 ms, cpu=0.041 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#190:1):Float[256] (len=1024 bytes, gpu=0.318 ms, cpu=0.037 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#191:1):Float[256] (len=1024 bytes, gpu=0.685 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#192:1):Float[256] (len=1024 bytes, gpu=7.676 ms, cpu=0.033 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#193:1):Long[] (len=8 bytes, gpu=1.565 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#194:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.385 ms, cpu=0.128 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#195:1):Float[256] (len=1024 bytes, gpu=0.406 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#196:1):Float[256] (len=1024 bytes, gpu=0.277 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#197:1):Float[256] (len=1024 bytes, gpu=0.705 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#198:1):Float[256] (len=1024 bytes, gpu=0.643 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#199:1):Long[] (len=8 bytes, gpu=0.402 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#200:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=11.569 ms, cpu=0.068 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#201:1):Float[1024] (len=4.00 KB, gpu=0.551 ms, cpu=0.061 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#202:1):Float[1024] (len=4.00 KB, gpu=0.453 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#203:1):Float[1024] (len=4.00 KB, gpu=0.284 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#204:1):Float[1024] (len=4.00 KB, gpu=0.754 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#205:1):Long[] (len=8 bytes, gpu=1.718 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#206:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=0.568 ms, cpu=0.073 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#207:1):Float[256] (len=1024 bytes, gpu=0.421 ms, cpu=0.045 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#208:1):Float[256] (len=1024 bytes, gpu=0.560 ms, cpu=0.031 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#209:1):Float[256] (len=1024 bytes, gpu=0.618 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#210:1):Float[256] (len=1024 bytes, gpu=0.741 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#211:1):Long[] (len=8 bytes, gpu=0.693 ms, cpu=0.034 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#212:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=2.471 ms, cpu=0.112 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#213:1):Float[256] (len=1024 bytes, gpu=0.422 ms, cpu=0.090 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#214:1):Float[256] (len=1024 bytes, gpu=0.610 ms, cpu=0.042 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#215:1):Float[256] (len=1024 bytes, gpu=0.511 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#216:1):Float[256] (len=1024 bytes, gpu=0.657 ms, cpu=0.033 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#217:1):Long[] (len=8 bytes, gpu=0.658 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#218:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=1.553 ms, cpu=0.078 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#219:1):Float[1024] (len=4.00 KB, gpu=0.187 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#220:1):Float[1024] (len=4.00 KB, gpu=0.656 ms, cpu=0.042 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#221:1):Float[1024] (len=4.00 KB, gpu=0.630 ms, cpu=0.041 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#222:1):Float[1024] (len=4.00 KB, gpu=0.528 ms, cpu=0.053 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#223:1):Long[] (len=8 bytes, gpu=0.618 ms, cpu=0.040 ms) | |
| BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#224:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=0.646 ms, cpu=0.088 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#225:1):Float[256] (len=1024 bytes, gpu=1.658 ms, cpu=0.039 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#226:1):Float[256] (len=1024 bytes, gpu=0.544 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#227:1):Float[256] (len=1024 bytes, gpu=0.655 ms, cpu=0.050 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#228:1):Float[256] (len=1024 bytes, gpu=0.627 ms, cpu=0.057 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#229:1):Long[] (len=8 bytes, gpu=0.541 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#230:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.509 ms, cpu=0.136 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#231:1):Float[256] (len=1024 bytes, gpu=0.288 ms, cpu=0.041 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#232:1):Float[256] (len=1024 bytes, gpu=2.692 ms, cpu=0.034 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#233:1):Float[256] (len=1024 bytes, gpu=0.513 ms, cpu=0.101 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#234:1):Float[256] (len=1024 bytes, gpu=0.675 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#235:1):Long[] (len=8 bytes, gpu=0.702 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#236:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=0.292 ms, cpu=0.080 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#237:1):Float[1024] (len=4.00 KB, gpu=0.292 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#238:1):Float[1024] (len=4.00 KB, gpu=1.716 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#239:1):Float[1024] (len=4.00 KB, gpu=0.546 ms, cpu=0.116 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#240:1):Float[1024] (len=4.00 KB, gpu=0.691 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#241:1):Long[] (len=8 bytes, gpu=0.686 ms, cpu=0.050 ms) | |
| BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#242:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=0.329 ms, cpu=0.380 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#243:1):Float[256] (len=1024 bytes, gpu=0.538 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#244:1):Float[256] (len=1024 bytes, gpu=0.649 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#245:1):Float[256] (len=1024 bytes, gpu=7.715 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#246:1):Float[256] (len=1024 bytes, gpu=9.446 ms, cpu=0.047 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#247:1):Long[] (len=8 bytes, gpu=0.021 ms, cpu=0.049 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#248:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.613 ms, cpu=0.114 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#249:1):Float[256] (len=1024 bytes, gpu=0.037 ms, cpu=0.113 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#250:1):Float[256] (len=1024 bytes, gpu=0.216 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#251:1):Float[256] (len=1024 bytes, gpu=0.698 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#252:1):Float[256] (len=1024 bytes, gpu=0.714 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#253:1):Long[] (len=8 bytes, gpu=7.670 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#254:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=9.121 ms, cpu=0.093 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#255:1):Float[1024] (len=4.00 KB, gpu=0.637 ms, cpu=0.051 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#256:1):Float[1024] (len=4.00 KB, gpu=0.670 ms, cpu=0.034 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#257:1):Float[1024] (len=4.00 KB, gpu=0.530 ms, cpu=0.037 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#258:1):Float[1024] (len=4.00 KB, gpu=0.662 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#259:1):Long[] (len=8 bytes, gpu=0.699 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[512, 1024, 1, 1] --> MPS(buf#260:1):Float[512, 1024, 1, 1] (len=2.00 MB, gpu=6.573 ms, cpu=0.155 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#261:1):Float[512] (len=2.00 KB, gpu=0.219 ms, cpu=0.093 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#262:1):Float[512] (len=2.00 KB, gpu=5.658 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#263:1):Float[512] (len=2.00 KB, gpu=0.205 ms, cpu=0.039 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#264:1):Float[512] (len=2.00 KB, gpu=0.690 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#265:1):Long[] (len=8 bytes, gpu=7.753 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[512, 512, 3, 3] --> MPS(buf#266:1):Float[512, 512, 3, 3] (len=9.00 MB, gpu=4.994 ms, cpu=0.374 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#267:1):Float[512] (len=2.00 KB, gpu=0.379 ms, cpu=0.063 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#268:1):Float[512] (len=2.00 KB, gpu=0.567 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#269:1):Float[512] (len=2.00 KB, gpu=4.521 ms, cpu=0.031 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#270:1):Float[512] (len=2.00 KB, gpu=0.469 ms, cpu=0.080 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#271:1):Long[] (len=8 bytes, gpu=0.719 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[2048, 512, 1, 1] --> MPS(buf#272:1):Float[2048, 512, 1, 1] (len=4.00 MB, gpu=0.266 ms, cpu=0.190 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#273:1):Float[2048] (len=8.00 KB, gpu=6.423 ms, cpu=0.046 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#274:1):Float[2048] (len=8.00 KB, gpu=0.512 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#275:1):Float[2048] (len=8.00 KB, gpu=1.442 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#276:1):Float[2048] (len=8.00 KB, gpu=0.930 ms, cpu=0.116 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#277:1):Long[] (len=8 bytes, gpu=0.731 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[2048, 1024, 1, 1] --> MPS(buf#278:1):Float[2048, 1024, 1, 1] (len=8.00 MB, gpu=0.398 ms, cpu=1.319 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#279:1):Float[2048] (len=8.00 KB, gpu=0.441 ms, cpu=0.049 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#280:1):Float[2048] (len=8.00 KB, gpu=14.445 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#281:1):Float[2048] (len=8.00 KB, gpu=0.308 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#282:1):Float[2048] (len=8.00 KB, gpu=7.629 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#283:1):Long[] (len=8 bytes, gpu=9.517 ms, cpu=0.040 ms) | |
| BlitCopySync: CPU:Float[512, 2048, 1, 1] --> MPS(buf#284:1):Float[512, 2048, 1, 1] (len=4.00 MB, gpu=0.110 ms, cpu=0.204 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#285:1):Float[512] (len=2.00 KB, gpu=0.514 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#286:1):Float[512] (len=2.00 KB, gpu=0.654 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#287:1):Float[512] (len=2.00 KB, gpu=0.630 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#288:1):Float[512] (len=2.00 KB, gpu=0.726 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#289:1):Long[] (len=8 bytes, gpu=0.693 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[512, 512, 3, 3] --> MPS(buf#290:1):Float[512, 512, 3, 3] (len=9.00 MB, gpu=7.282 ms, cpu=0.426 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#291:1):Float[512] (len=2.00 KB, gpu=3.143 ms, cpu=0.153 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#292:1):Float[512] (len=2.00 KB, gpu=0.484 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#293:1):Float[512] (len=2.00 KB, gpu=0.323 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#294:1):Float[512] (len=2.00 KB, gpu=0.732 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#295:1):Long[] (len=8 bytes, gpu=0.676 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[2048, 512, 1, 1] --> MPS(buf#296:1):Float[2048, 512, 1, 1] (len=4.00 MB, gpu=7.586 ms, cpu=0.193 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#297:1):Float[2048] (len=8.00 KB, gpu=9.155 ms, cpu=0.046 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#298:1):Float[2048] (len=8.00 KB, gpu=0.068 ms, cpu=0.076 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#299:1):Float[2048] (len=8.00 KB, gpu=0.373 ms, cpu=0.048 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#300:1):Float[2048] (len=8.00 KB, gpu=0.712 ms, cpu=0.037 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#301:1):Long[] (len=8 bytes, gpu=0.612 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[512, 2048, 1, 1] --> MPS(buf#302:1):Float[512, 2048, 1, 1] (len=4.00 MB, gpu=0.576 ms, cpu=0.201 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#303:1):Float[512] (len=2.00 KB, gpu=7.373 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#304:1):Float[512] (len=2.00 KB, gpu=7.398 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#305:1):Float[512] (len=2.00 KB, gpu=9.302 ms, cpu=0.043 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#306:1):Float[512] (len=2.00 KB, gpu=0.245 ms, cpu=0.099 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#307:1):Long[] (len=8 bytes, gpu=0.577 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[512, 512, 3, 3] --> MPS(buf#308:1):Float[512, 512, 3, 3] (len=9.00 MB, gpu=0.163 ms, cpu=1.537 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#309:1):Float[512] (len=2.00 KB, gpu=0.543 ms, cpu=0.104 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#310:1):Float[512] (len=2.00 KB, gpu=0.557 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#311:1):Float[512] (len=2.00 KB, gpu=7.725 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#312:1):Float[512] (len=2.00 KB, gpu=9.582 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#313:1):Long[] (len=8 bytes, gpu=0.560 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Float[2048, 512, 1, 1] --> MPS(buf#314:1):Float[2048, 512, 1, 1] (len=4.00 MB, gpu=0.310 ms, cpu=0.241 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#315:1):Float[2048] (len=8.00 KB, gpu=0.238 ms, cpu=0.099 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#316:1):Float[2048] (len=8.00 KB, gpu=0.716 ms, cpu=0.037 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#317:1):Float[2048] (len=8.00 KB, gpu=0.661 ms, cpu=0.040 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#318:1):Float[2048] (len=8.00 KB, gpu=0.674 ms, cpu=0.031 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#319:1):Long[] (len=8 bytes, gpu=7.593 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[256, 256, 1, 1] --> MPS(buf#320:1):Float[256, 256, 1, 1] (len=256.00 KB, gpu=2.280 ms, cpu=0.067 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#321:1):Float[256] (len=1024 bytes, gpu=0.250 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#322:1):Float[256] (len=1024 bytes, gpu=0.690 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#323:1):Float[256] (len=1024 bytes, gpu=0.501 ms, cpu=0.039 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#324:1):Float[256] (len=1024 bytes, gpu=0.571 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#325:1):Long[] (len=8 bytes, gpu=6.555 ms, cpu=0.035 ms) | |
| BlitCopySync: CPU:Float[256, 512, 1, 1] --> MPS(buf#326:1):Float[256, 512, 1, 1] (len=512.00 KB, gpu=0.272 ms, cpu=0.053 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#327:1):Float[256] (len=1024 bytes, gpu=7.698 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#328:1):Float[256] (len=1024 bytes, gpu=5.326 ms, cpu=0.042 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#329:1):Float[256] (len=1024 bytes, gpu=0.427 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#330:1):Float[256] (len=1024 bytes, gpu=0.583 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#331:1):Long[] (len=8 bytes, gpu=3.654 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#332:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=0.488 ms, cpu=0.064 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#333:1):Float[256] (len=1024 bytes, gpu=0.516 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#334:1):Float[256] (len=1024 bytes, gpu=0.576 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#335:1):Float[256] (len=1024 bytes, gpu=0.575 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#336:1):Float[256] (len=1024 bytes, gpu=5.537 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#337:1):Long[] (len=8 bytes, gpu=0.501 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[256, 2048, 1, 1] --> MPS(buf#338:1):Float[256, 2048, 1, 1] (len=2.00 MB, gpu=0.433 ms, cpu=0.121 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#339:1):Float[256] (len=1024 bytes, gpu=9.386 ms, cpu=0.033 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#340:1):Float[256] (len=1024 bytes, gpu=0.312 ms, cpu=0.126 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#341:1):Float[256] (len=1024 bytes, gpu=0.604 ms, cpu=0.049 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#342:1):Float[256] (len=1024 bytes, gpu=0.759 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#343:1):Long[] (len=8 bytes, gpu=0.603 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#344:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.609 ms, cpu=0.147 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#345:1):Float[256] (len=1024 bytes, gpu=0.661 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#346:1):Float[256] (len=1024 bytes, gpu=7.749 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#347:1):Float[256] (len=1024 bytes, gpu=4.507 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#348:1):Float[256] (len=1024 bytes, gpu=0.705 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#349:1):Long[] (len=8 bytes, gpu=0.178 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#350:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.449 ms, cpu=0.125 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#351:1):Float[256] (len=1024 bytes, gpu=7.632 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#352:1):Float[256] (len=1024 bytes, gpu=7.039 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#353:1):Float[256] (len=1024 bytes, gpu=7.419 ms, cpu=0.042 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#354:1):Float[256] (len=1024 bytes, gpu=6.313 ms, cpu=0.081 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#355:1):Long[] (len=8 bytes, gpu=0.493 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#356:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=1.579 ms, cpu=0.125 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#357:1):Float[256] (len=1024 bytes, gpu=0.274 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#358:1):Float[256] (len=1024 bytes, gpu=0.682 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#359:1):Float[256] (len=1024 bytes, gpu=0.688 ms, cpu=0.031 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#360:1):Float[256] (len=1024 bytes, gpu=0.678 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#361:1):Long[] (len=8 bytes, gpu=0.632 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#362:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.635 ms, cpu=0.127 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#363:1):Float[256] (len=1024 bytes, gpu=7.348 ms, cpu=0.159 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#364:1):Float[256] (len=1024 bytes, gpu=1.364 ms, cpu=0.066 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#365:1):Float[256] (len=1024 bytes, gpu=0.402 ms, cpu=0.043 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#366:1):Float[256] (len=1024 bytes, gpu=0.665 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#367:1):Long[] (len=8 bytes, gpu=0.732 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#368:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.658 ms, cpu=0.122 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#369:1):Float[256] (len=1024 bytes, gpu=0.510 ms, cpu=0.046 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#370:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=1.616 ms, cpu=0.113 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#371:1):Float[256] (len=1024 bytes, gpu=0.143 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Float[3, 256, 1, 1] --> MPS(buf#372:1):Float[3, 256, 1, 1] (len=3.00 KB, gpu=0.705 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[3] --> MPS(buf#373:1):Float[3] (len=12 bytes, gpu=0.657 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[12, 256, 1, 1] --> MPS(buf#374:1):Float[12, 256, 1, 1] (len=12.00 KB, gpu=0.669 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[12] --> MPS(buf#375:1):Float[12] (len=48 bytes, gpu=0.516 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#376:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.592 ms, cpu=0.115 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#377:1):Float[256] (len=1024 bytes, gpu=7.406 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#378:1):Float[256] (len=1024 bytes, gpu=7.272 ms, cpu=0.037 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#379:1):Float[256] (len=1024 bytes, gpu=3.279 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#380:1):Float[256] (len=1024 bytes, gpu=0.586 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#381:1):Long[] (len=8 bytes, gpu=0.695 ms, cpu=0.060 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#382:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.622 ms, cpu=0.108 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#383:1):Float[256] (len=1024 bytes, gpu=0.272 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#384:1):Float[256] (len=1024 bytes, gpu=9.420 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#385:1):Float[256] (len=1024 bytes, gpu=0.444 ms, cpu=0.039 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#386:1):Float[256] (len=1024 bytes, gpu=0.516 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#387:1):Long[] (len=8 bytes, gpu=0.688 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#388:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.470 ms, cpu=1.279 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#389:1):Float[256] (len=1024 bytes, gpu=0.163 ms, cpu=0.039 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#390:1):Float[256] (len=1024 bytes, gpu=7.610 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#391:1):Float[256] (len=1024 bytes, gpu=9.255 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#392:1):Float[256] (len=1024 bytes, gpu=0.408 ms, cpu=0.048 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#393:1):Long[] (len=8 bytes, gpu=0.718 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#394:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.420 ms, cpu=0.139 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#395:1):Float[256] (len=1024 bytes, gpu=0.301 ms, cpu=0.055 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#396:1):Float[256] (len=1024 bytes, gpu=0.761 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#397:1):Float[256] (len=1024 bytes, gpu=0.695 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#398:1):Float[256] (len=1024 bytes, gpu=7.563 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#399:1):Long[] (len=8 bytes, gpu=9.397 ms, cpu=0.043 ms) | |
| BlitCopySync: CPU:Float[1024, 12544] --> MPS(buf#400:1):Float[1024, 12544] (len=49.00 MB, gpu=1.973 ms, cpu=49.488 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#401:1):Float[1024] (len=4.00 KB, gpu=1.821 ms, cpu=0.189 ms) | |
| BlitCopySync: CPU:Float[91, 1024] --> MPS(buf#402:1):Float[91, 1024] (len=364.00 KB, gpu=0.475 ms, cpu=0.043 ms) | |
| BlitCopySync: CPU:Float[91] --> MPS(buf#403:1):Float[91] (len=364 bytes, gpu=0.305 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[364, 1024] --> MPS(buf#404:1):Float[364, 1024] (len=1.42 MB, gpu=0.623 ms, cpu=0.077 ms) | |
| BlitCopySync: CPU:Float[364] --> MPS(buf#405:1):Float[364] (len=1.42 KB, gpu=0.642 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[3] --> MPS(buf#406:1):Float[3] (len=12 bytes, gpu=0.007 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[3] --> MPS(buf#407:1):Float[3] (len=12 bytes, gpu=0.432 ms, cpu=0.029 ms) | |
| aten::sub_out_mps::f32[3,224,224]:f32[3,1,1]:f32[3,224,224] (id=G1, run=1, gpu=2.191 ms, cpu=0.923 ms) | |
| aten::div_out_mps::f32[3,224,224]:f32[3,1,1]:f32[3,224,224] (id=G2, run=1, gpu=2.191 ms, cpu=0.923 ms) | |
| aten::upsample_bilinear:f32[1,3,224,224]:[1.000000,0.000000]:[Undefined] (id=G3, run=1, gpu=7.177 ms, cpu=0.202 ms) | |
| BlitCopy: MPS(buf#410:2):Float[3, 800, 800] --> MPS(buf#411:2):Float[3, 800, 800] (len=7.32 MB, gpu=7.177 ms, cpu=0.202 ms) | |
| aten::mps_convolution:2:2:1:1:3:3:1:Contiguous:f32[1,3,800,800]:f32[64,3,7,7]:0:nobias (id=G4, run=1, gpu=7.177 ms, cpu=0.202 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,400,400::f32[1,64,400,400]:f32[64]:f32[64]:f32[64]:f32[64] (id=G5, run=1, gpu=8.796 ms, cpu=0.073 ms) | |
| aten::relu_:f32[1,64,400,400] (id=G6, run=1, gpu=8.796 ms, cpu=0.073 ms) | |
| aten::max_pool2d:f32[1,64,400,400]:Undefined:Undefined:K[3,3,]:S[2,2,]:P[1,1,]:D[1,1,]:NCHW (id=G7, run=1, gpu=8.796 ms, cpu=0.073 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[64,64,1,1]:0:nobias (id=G8, run=1, gpu=8.796 ms, cpu=0.073 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=11.038 ms, cpu=0.046 ms) | |
| aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=11.038 ms, cpu=0.046 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,64,200,200]:f32[64,64,3,3]:0:nobias (id=G11, run=3, gpu=11.038 ms, cpu=0.046 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=11.038 ms, cpu=0.046 ms) | |
| aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=11.038 ms, cpu=0.046 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[256,64,1,1]:0:nobias (id=G12, run=4, gpu=15.966 ms, cpu=0.054 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=4, gpu=15.966 ms, cpu=0.054 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[256,64,1,1]:0:nobias (id=G12, run=4, gpu=15.966 ms, cpu=0.054 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=4, gpu=8.034 ms, cpu=0.069 ms) | |
| aten::add_out_mps::f32[1,256,200,200]:f32[1,256,200,200]:f32[1,256,200,200] (id=G14, run=3, gpu=8.034 ms, cpu=0.069 ms) | |
| aten::relu_:f32[1,256,200,200] (id=G15, run=3, gpu=8.034 ms, cpu=0.069 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[64,256,1,1]:0:nobias (id=G16, run=2, gpu=8.034 ms, cpu=0.069 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=8.034 ms, cpu=0.069 ms) | |
| aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=12.917 ms, cpu=0.055 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,64,200,200]:f32[64,64,3,3]:0:nobias (id=G11, run=3, gpu=12.917 ms, cpu=0.055 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=12.917 ms, cpu=0.055 ms) | |
| aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=12.917 ms, cpu=0.055 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[256,64,1,1]:0:nobias (id=G12, run=4, gpu=12.917 ms, cpu=0.055 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=4, gpu=10.049 ms, cpu=0.037 ms) | |
| aten::add_out_mps::f32[1,256,200,200]:f32[1,256,200,200]:f32[1,256,200,200] (id=G14, run=3, gpu=10.049 ms, cpu=0.037 ms) | |
| aten::relu_:f32[1,256,200,200] (id=G15, run=3, gpu=8.950 ms, cpu=0.039 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[64,256,1,1]:0:nobias (id=G16, run=2, gpu=8.950 ms, cpu=0.039 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=8.950 ms, cpu=0.039 ms) | |
| aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=8.950 ms, cpu=0.039 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,64,200,200]:f32[64,64,3,3]:0:nobias (id=G11, run=3, gpu=8.950 ms, cpu=0.039 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=16.026 ms, cpu=0.033 ms) | |
| aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=16.026 ms, cpu=0.033 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[256,64,1,1]:0:nobias (id=G12, run=4, gpu=16.026 ms, cpu=0.033 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=4, gpu=16.026 ms, cpu=0.033 ms) | |
| aten::add_out_mps::f32[1,256,200,200]:f32[1,256,200,200]:f32[1,256,200,200] (id=G14, run=3, gpu=7.613 ms, cpu=0.099 ms) | |
| aten::relu_:f32[1,256,200,200] (id=G15, run=3, gpu=7.613 ms, cpu=0.099 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[128,256,1,1]:0:nobias (id=G17, run=1, gpu=7.613 ms, cpu=0.099 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,200,200::f32[1,128,200,200]:f32[128]:f32[128]:f32[128]:f32[128] (id=G18, run=1, gpu=7.613 ms, cpu=0.099 ms) | |
| aten::relu_:f32[1,128,200,200] (id=G19, run=1, gpu=7.613 ms, cpu=0.099 ms) | |
| aten::mps_convolution:2:2:1:1:1:1:1:Contiguous:f32[1,128,200,200]:f32[128,128,3,3]:0:nobias (id=G20, run=1, gpu=7.384 ms, cpu=0.029 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=7.384 ms, cpu=0.029 ms) | |
| aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=7.384 ms, cpu=0.029 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,128,100,100]:f32[512,128,1,1]:0:nobias (id=G23, run=4, gpu=7.384 ms, cpu=0.029 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=16.453 ms, cpu=0.046 ms) | |
| aten::mps_convolution:2:2:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[512,256,1,1]:0:nobias (id=G25, run=1, gpu=16.453 ms, cpu=0.046 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=16.453 ms, cpu=0.046 ms) | |
| aten::add_out_mps::f32[1,512,100,100]:f32[1,512,100,100]:f32[1,512,100,100] (id=G26, run=4, gpu=16.453 ms, cpu=0.046 ms) | |
| aten::relu_:f32[1,512,100,100] (id=G27, run=4, gpu=16.453 ms, cpu=0.046 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[128,512,1,1]:0:nobias (id=G28, run=3, gpu=16.453 ms, cpu=0.046 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=16.453 ms, cpu=0.046 ms) | |
| aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=16.453 ms, cpu=0.046 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,128,100,100]:f32[128,128,3,3]:0:nobias (id=G29, run=3, gpu=1.516 ms, cpu=0.018 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=1.516 ms, cpu=0.018 ms) | |
| aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=1.516 ms, cpu=0.018 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,128,100,100]:f32[512,128,1,1]:0:nobias (id=G23, run=4, gpu=1.516 ms, cpu=0.018 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=15.582 ms, cpu=0.064 ms) | |
| aten::add_out_mps::f32[1,512,100,100]:f32[1,512,100,100]:f32[1,512,100,100] (id=G26, run=4, gpu=15.582 ms, cpu=0.064 ms) | |
| aten::relu_:f32[1,512,100,100] (id=G27, run=4, gpu=15.582 ms, cpu=0.064 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[128,512,1,1]:0:nobias (id=G28, run=3, gpu=15.582 ms, cpu=0.064 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=15.582 ms, cpu=0.064 ms) | |
| aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=15.582 ms, cpu=0.064 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,128,100,100]:f32[128,128,3,3]:0:nobias (id=G29, run=3, gpu=15.582 ms, cpu=0.064 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=15.582 ms, cpu=0.064 ms) | |
| aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=8.675 ms, cpu=0.042 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,128,100,100]:f32[512,128,1,1]:0:nobias (id=G23, run=4, gpu=8.675 ms, cpu=0.042 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=8.675 ms, cpu=0.042 ms) | |
| aten::add_out_mps::f32[1,512,100,100]:f32[1,512,100,100]:f32[1,512,100,100] (id=G26, run=4, gpu=8.675 ms, cpu=0.042 ms) | |
| aten::relu_:f32[1,512,100,100] (id=G27, run=4, gpu=8.675 ms, cpu=0.042 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[128,512,1,1]:0:nobias (id=G28, run=3, gpu=7.790 ms, cpu=0.043 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=7.790 ms, cpu=0.043 ms) | |
| aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=7.790 ms, cpu=0.043 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,128,100,100]:f32[128,128,3,3]:0:nobias (id=G29, run=3, gpu=7.790 ms, cpu=0.043 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=14.002 ms, cpu=0.120 ms) | |
| aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=14.002 ms, cpu=0.120 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,128,100,100]:f32[512,128,1,1]:0:nobias (id=G23, run=4, gpu=14.002 ms, cpu=0.120 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=14.002 ms, cpu=0.120 ms) | |
| aten::add_out_mps::f32[1,512,100,100]:f32[1,512,100,100]:f32[1,512,100,100] (id=G26, run=4, gpu=14.002 ms, cpu=0.120 ms) | |
| aten::relu_:f32[1,512,100,100] (id=G27, run=4, gpu=14.002 ms, cpu=0.120 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[256,512,1,1]:0:nobias (id=G30, run=1, gpu=14.002 ms, cpu=0.120 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,100,100::f32[1,256,100,100]:f32[256]:f32[256]:f32[256]:f32[256] (id=G31, run=1, gpu=14.002 ms, cpu=0.120 ms) | |
| aten::relu_:f32[1,256,100,100] (id=G32, run=1, gpu=2.444 ms, cpu=0.166 ms) | |
| aten::mps_convolution:2:2:1:1:1:1:1:Contiguous:f32[1,256,100,100]:f32[256,256,3,3]:0:nobias (id=G33, run=1, gpu=2.444 ms, cpu=0.166 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=2.444 ms, cpu=0.166 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=2.444 ms, cpu=0.166 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=2.444 ms, cpu=0.166 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=9.966 ms, cpu=0.023 ms) | |
| aten::mps_convolution:2:2:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[1024,512,1,1]:0:nobias (id=G38, run=1, gpu=9.966 ms, cpu=0.023 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=9.966 ms, cpu=0.023 ms) | |
| aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=9.966 ms, cpu=0.023 ms) | |
| aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=9.966 ms, cpu=0.023 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=9.966 ms, cpu=0.023 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=9.966 ms, cpu=0.023 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=3.536 ms, cpu=0.018 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=3.536 ms, cpu=0.018 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=3.536 ms, cpu=0.018 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=3.536 ms, cpu=0.018 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=3.536 ms, cpu=0.018 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=12.340 ms, cpu=0.070 ms) | |
| aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=12.340 ms, cpu=0.070 ms) | |
| aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=12.340 ms, cpu=0.070 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=12.340 ms, cpu=0.070 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=12.340 ms, cpu=0.070 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=12.340 ms, cpu=0.070 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=12.340 ms, cpu=0.070 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=10.720 ms, cpu=0.046 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=10.720 ms, cpu=0.046 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=10.720 ms, cpu=0.046 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=10.720 ms, cpu=0.046 ms) | |
| aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=10.720 ms, cpu=0.046 ms) | |
| aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=2.959 ms, cpu=0.056 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=2.959 ms, cpu=0.056 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=2.959 ms, cpu=0.056 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=2.959 ms, cpu=0.056 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=2.959 ms, cpu=0.056 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=14.013 ms, cpu=0.049 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=14.013 ms, cpu=0.049 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=14.013 ms, cpu=0.049 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=14.013 ms, cpu=0.049 ms) | |
| aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=14.013 ms, cpu=0.049 ms) | |
| aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=14.013 ms, cpu=0.049 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=14.013 ms, cpu=0.049 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=8.097 ms, cpu=0.041 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=8.097 ms, cpu=0.041 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=8.097 ms, cpu=0.041 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=8.097 ms, cpu=0.041 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=8.097 ms, cpu=0.041 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=7.875 ms, cpu=0.050 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=7.875 ms, cpu=0.050 ms) | |
| aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=7.875 ms, cpu=0.050 ms) | |
| aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=7.875 ms, cpu=0.050 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=7.875 ms, cpu=0.050 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=17.015 ms, cpu=0.057 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=17.015 ms, cpu=0.057 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=17.015 ms, cpu=0.057 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=17.015 ms, cpu=0.057 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=17.015 ms, cpu=0.057 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=17.015 ms, cpu=0.057 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=17.015 ms, cpu=0.057 ms) | |
| aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=9.246 ms, cpu=0.437 ms) | |
| aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=9.246 ms, cpu=0.437 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[512,1024,1,1]:0:nobias (id=G43, run=1, gpu=9.246 ms, cpu=0.437 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,50,50::f32[1,512,50,50]:f32[512]:f32[512]:f32[512]:f32[512] (id=G44, run=1, gpu=9.246 ms, cpu=0.437 ms) | |
| aten::relu_:f32[1,512,50,50] (id=G45, run=1, gpu=9.246 ms, cpu=0.437 ms) | |
| aten::mps_convolution:2:2:1:1:1:1:1:Contiguous:f32[1,512,50,50]:f32[512,512,3,3]:0:nobias (id=G46, run=1, gpu=4.729 ms, cpu=0.033 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=4.729 ms, cpu=0.033 ms) | |
| aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=4.729 ms, cpu=0.033 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,25,25]:f32[2048,512,1,1]:0:nobias (id=G49, run=3, gpu=4.729 ms, cpu=0.033 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,2048,25,25::f32[1,2048,25,25]:f32[2048]:f32[2048]:f32[2048]:f32[2048] (id=G50, run=4, gpu=3.839 ms, cpu=0.049 ms) | |
| aten::mps_convolution:2:2:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[2048,1024,1,1]:0:nobias (id=G51, run=1, gpu=3.839 ms, cpu=0.049 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,2048,25,25::f32[1,2048,25,25]:f32[2048]:f32[2048]:f32[2048]:f32[2048] (id=G50, run=4, gpu=3.839 ms, cpu=0.049 ms) | |
| aten::add_out_mps::f32[1,2048,25,25]:f32[1,2048,25,25]:f32[1,2048,25,25] (id=G52, run=3, gpu=3.839 ms, cpu=0.049 ms) | |
| aten::relu_:f32[1,2048,25,25] (id=G53, run=3, gpu=3.839 ms, cpu=0.049 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,2048,25,25]:f32[512,2048,1,1]:0:nobias (id=G54, run=2, gpu=3.839 ms, cpu=0.049 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=3.839 ms, cpu=0.049 ms) | |
| aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=3.839 ms, cpu=0.049 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,512,25,25]:f32[512,512,3,3]:0:nobias (id=G55, run=2, gpu=13.171 ms, cpu=0.025 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=13.171 ms, cpu=0.025 ms) | |
| aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=13.171 ms, cpu=0.025 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,25,25]:f32[2048,512,1,1]:0:nobias (id=G49, run=3, gpu=13.171 ms, cpu=0.025 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,2048,25,25::f32[1,2048,25,25]:f32[2048]:f32[2048]:f32[2048]:f32[2048] (id=G50, run=4, gpu=8.027 ms, cpu=0.052 ms) | |
| aten::add_out_mps::f32[1,2048,25,25]:f32[1,2048,25,25]:f32[1,2048,25,25] (id=G52, run=3, gpu=8.027 ms, cpu=0.052 ms) | |
| aten::relu_:f32[1,2048,25,25] (id=G53, run=3, gpu=8.027 ms, cpu=0.052 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,2048,25,25]:f32[512,2048,1,1]:0:nobias (id=G54, run=2, gpu=8.027 ms, cpu=0.052 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=8.027 ms, cpu=0.052 ms) | |
| aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=8.027 ms, cpu=0.052 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,512,25,25]:f32[512,512,3,3]:0:nobias (id=G55, run=2, gpu=8.027 ms, cpu=0.052 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=8.027 ms, cpu=0.052 ms) | |
| aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=1.109 ms, cpu=0.102 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,25,25]:f32[2048,512,1,1]:0:nobias (id=G49, run=3, gpu=1.109 ms, cpu=0.102 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,2048,25,25::f32[1,2048,25,25]:f32[2048]:f32[2048]:f32[2048]:f32[2048] (id=G50, run=4, gpu=1.109 ms, cpu=0.102 ms) | |
| aten::add_out_mps::f32[1,2048,25,25]:f32[1,2048,25,25]:f32[1,2048,25,25] (id=G52, run=3, gpu=1.109 ms, cpu=0.102 ms) | |
| aten::relu_:f32[1,2048,25,25] (id=G53, run=3, gpu=1.109 ms, cpu=0.102 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,2048,25,25]:f32[256,2048,1,1]:0:nobias (id=G56, run=1, gpu=5.865 ms, cpu=0.056 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,25,25::f32[1,256,25,25]:f32[256]:f32[256]:f32[256]:f32[256] (id=G57, run=2, gpu=5.865 ms, cpu=0.056 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,25,25]:f32[256,256,3,3]:0:nobias (id=G58, run=1, gpu=5.865 ms, cpu=0.056 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,25,25::f32[1,256,25,25]:f32[256]:f32[256]:f32[256]:f32[256] (id=G57, run=2, gpu=1.035 ms, cpu=0.023 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=1.035 ms, cpu=0.023 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=1.035 ms, cpu=0.023 ms) | |
| aten::upsample_nearest:f32[1,256,25,25]:[1.000000,0.000000]:[Undefined] (id=G59, run=1, gpu=7.687 ms, cpu=0.026 ms) | |
| aten::add_out_mps::f32[1,256,50,50]:f32[1,256,50,50]:f32[1,256,50,50] (id=G60, run=1, gpu=1.933 ms, cpu=0.012 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=6, gpu=1.933 ms, cpu=0.012 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=13, gpu=1.933 ms, cpu=0.012 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[256,512,1,1]:0:nobias (id=G30, run=2, gpu=1.933 ms, cpu=0.012 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,100,100::f32[1,256,100,100]:f32[256]:f32[256]:f32[256]:f32[256] (id=G31, run=2, gpu=1.024 ms, cpu=0.023 ms) | |
| aten::upsample_nearest:f32[1,256,50,50]:[1.000000,0.000000]:[Undefined] (id=G61, run=1, gpu=30.087 ms, cpu=0.024 ms) | |
| aten::add_out_mps::f32[1,256,100,100]:f32[1,256,100,100]:f32[1,256,100,100] (id=G62, run=1, gpu=30.087 ms, cpu=0.024 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,100,100]:f32[256,256,3,3]:0:nobias (id=G63, run=1, gpu=30.087 ms, cpu=0.024 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,100,100::f32[1,256,100,100]:f32[256]:f32[256]:f32[256]:f32[256] (id=G31, run=3, gpu=30.087 ms, cpu=0.024 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[256,256,1,1]:0:nobias (id=G64, run=1, gpu=30.087 ms, cpu=0.024 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=5, gpu=0.030 ms, cpu=0.014 ms) | |
| aten::upsample_nearest:f32[1,256,100,100]:[1.000000,0.000000]:[Undefined] (id=G65, run=1, gpu=27.666 ms, cpu=0.025 ms) | |
| aten::add_out_mps::f32[1,256,200,200]:f32[1,256,200,200]:f32[1,256,200,200] (id=G14, run=4, gpu=27.666 ms, cpu=0.025 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,200,200]:f32[256,256,3,3]:0:nobias (id=G66, run=1, gpu=22.945 ms, cpu=0.247 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=6, gpu=22.945 ms, cpu=0.247 ms) | |
| aten::max_pool2d:f32[1,256,25,25]:Undefined:Undefined:K[1,1,]:S[2,2,]:P[0,0,]:D[1,1,]:NCHW (id=G67, run=1, gpu=22.945 ms, cpu=0.247 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,200,200]:f32[256,256,3,3]:1:256 (id=G68, run=2, gpu=22.945 ms, cpu=0.247 ms) | |
| aten::relu_:f32[1,256,200,200] (id=G15, run=5, gpu=41.649 ms, cpu=0.023 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,200,200]:f32[256,256,3,3]:1:256 (id=G68, run=2, gpu=41.649 ms, cpu=0.023 ms) | |
| aten::relu_:f32[1,256,200,200] (id=G15, run=5, gpu=41.649 ms, cpu=0.023 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[3,256,1,1]:1:3 (id=G69, run=1, gpu=41.649 ms, cpu=0.023 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[12,256,1,1]:1:12 (id=G70, run=1, gpu=41.649 ms, cpu=0.023 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,100,100]:f32[256,256,3,3]:1:256 (id=G71, run=2, gpu=41.649 ms, cpu=0.023 ms) | |
| aten::relu_:f32[1,256,100,100] (id=G32, run=3, gpu=41.649 ms, cpu=0.023 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,100,100]:f32[256,256,3,3]:1:256 (id=G71, run=2, gpu=8.155 ms, cpu=0.054 ms) | |
| aten::relu_:f32[1,256,100,100] (id=G32, run=3, gpu=8.155 ms, cpu=0.054 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,100,100]:f32[3,256,1,1]:1:3 (id=G72, run=1, gpu=8.155 ms, cpu=0.054 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,100,100]:f32[12,256,1,1]:1:12 (id=G73, run=1, gpu=8.155 ms, cpu=0.054 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:1:256 (id=G74, run=2, gpu=8.155 ms, cpu=0.054 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=13, gpu=8.155 ms, cpu=0.054 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:1:256 (id=G74, run=2, gpu=8.155 ms, cpu=0.054 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=13, gpu=7.033 ms, cpu=0.024 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[3,256,1,1]:1:3 (id=G75, run=1, gpu=7.033 ms, cpu=0.024 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[12,256,1,1]:1:12 (id=G76, run=1, gpu=7.033 ms, cpu=0.024 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,25,25]:f32[256,256,3,3]:1:256 (id=G77, run=2, gpu=7.033 ms, cpu=0.024 ms) | |
| aten::relu_:f32[1,256,25,25] (id=G78, run=2, gpu=7.033 ms, cpu=0.024 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,25,25]:f32[256,256,3,3]:1:256 (id=G77, run=2, gpu=7.033 ms, cpu=0.024 ms) | |
| aten::relu_:f32[1,256,25,25] (id=G78, run=2, gpu=7.033 ms, cpu=0.024 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,25,25]:f32[3,256,1,1]:1:3 (id=G79, run=1, gpu=6.006 ms, cpu=0.025 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,25,25]:f32[12,256,1,1]:1:12 (id=G80, run=1, gpu=6.006 ms, cpu=0.025 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,13,13]:f32[256,256,3,3]:1:256 (id=G81, run=2, gpu=6.006 ms, cpu=0.025 ms) | |
| aten::relu_:f32[1,256,13,13] (id=G82, run=2, gpu=6.006 ms, cpu=0.025 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,13,13]:f32[256,256,3,3]:1:256 (id=G81, run=2, gpu=6.006 ms, cpu=0.025 ms) | |
| aten::relu_:f32[1,256,13,13] (id=G82, run=2, gpu=6.006 ms, cpu=0.025 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,13,13]:f32[3,256,1,1]:1:3 (id=G83, run=1, gpu=6.006 ms, cpu=0.025 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,13,13]:f32[12,256,1,1]:1:12 (id=G84, run=1, gpu=52.044 ms, cpu=0.546 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:4.000000 (id=G85, run=2, gpu=52.044 ms, cpu=0.546 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:4.000000 (id=G85, run=2, gpu=52.044 ms, cpu=0.546 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:8.000000 (id=G86, run=2, gpu=52.044 ms, cpu=0.546 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:8.000000 (id=G86, run=2, gpu=52.044 ms, cpu=0.546 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:16.000000 (id=G87, run=2, gpu=52.044 ms, cpu=0.546 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:16.000000 (id=G87, run=2, gpu=52.044 ms, cpu=0.546 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:32.000000 (id=G88, run=2, gpu=52.044 ms, cpu=0.546 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:32.000000 (id=G88, run=2, gpu=52.044 ms, cpu=0.546 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:61.000000 (id=G89, run=2, gpu=52.044 ms, cpu=0.546 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:61.000000 (id=G89, run=2, gpu=52.044 ms, cpu=0.546 ms) | |
| BlitCopySync: CPU:Float[3, 4] --> MPS(buf#407:1):Float[3, 4] (len=48 bytes, gpu=52.044 ms, cpu=0.546 ms) | |
| BlitCopySync: CPU:Float[3, 4] --> MPS(buf#406:1):Float[3, 4] (len=48 bytes, gpu=0.342 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[3, 4] --> MPS(buf#648:1):Float[3, 4] (len=48 bytes, gpu=0.599 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[3, 4] --> MPS(buf#649:1):Float[3, 4] (len=48 bytes, gpu=0.389 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Float[3, 4] --> MPS(buf#650:1):Float[3, 4] (len=48 bytes, gpu=0.465 ms, cpu=0.015 ms) | |
| aten::arange_mps_out:i32[200]:200 (id=G90, run=2, gpu=6.865 ms, cpu=0.053 ms) | |
| aten::mul:i32[200]:i64[Scalar]:i32[200] (id=G91, run=2, gpu=6.865 ms, cpu=0.053 ms) | |
| aten::arange_mps_out:i32[200]:200 (id=G90, run=2, gpu=6.865 ms, cpu=0.053 ms) | |
| aten::mul:i32[200]:i64[Scalar]:i32[200] (id=G91, run=2, gpu=6.865 ms, cpu=0.053 ms) | |
| aten::gather_kernel_2:MPS(buf#652:3):Int[100, 100]:MPS(buf#655:2):Int[100, 100] (id=K1, run=4, gpu=6.865 ms, cpu=0.053 ms) | |
| aten::gather_kernel_2:MPS(buf#652:3):Int[100, 100]:MPS(buf#655:2):Int[100, 100] (id=K1, run=4, gpu=6.865 ms, cpu=0.053 ms) | |
| aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=2, gpu=6.865 ms, cpu=0.053 ms) | |
| aten::add_out_mps::i32[40000,1,4]:f32[1,3,4]:f32[40000,3,4] (id=G93, run=1, gpu=6.865 ms, cpu=0.053 ms) | |
| aten::arange_mps_out:i32[100]:100 (id=G94, run=2, gpu=0.781 ms, cpu=0.036 ms) | |
| aten::mul:i32[100]:i64[Scalar]:i32[100] (id=G95, run=2, gpu=0.781 ms, cpu=0.036 ms) | |
| aten::arange_mps_out:i32[100]:100 (id=G94, run=2, gpu=0.781 ms, cpu=0.036 ms) | |
| aten::mul:i32[100]:i64[Scalar]:i32[100] (id=G95, run=2, gpu=0.781 ms, cpu=0.036 ms) | |
| aten::gather_kernel_2:MPS(buf#651:3):Int[50, 50]:MPS(buf#655:2):Int[50, 50] (id=K1, run=6, gpu=0.781 ms, cpu=0.036 ms) | |
| aten::gather_kernel_2:MPS(buf#651:3):Int[50, 50]:MPS(buf#655:2):Int[50, 50] (id=K1, run=6, gpu=0.781 ms, cpu=0.036 ms) | |
| aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=3, gpu=0.781 ms, cpu=0.036 ms) | |
| aten::add_out_mps::i32[10000,1,4]:f32[1,3,4]:f32[10000,3,4] (id=G96, run=1, gpu=0.781 ms, cpu=0.036 ms) | |
| aten::arange_mps_out:i32[50]:50 (id=G97, run=2, gpu=0.781 ms, cpu=0.036 ms) | |
| aten::mul:i32[50]:i64[Scalar]:i32[50] (id=G98, run=2, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::arange_mps_out:i32[50]:50 (id=G97, run=2, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::mul:i32[50]:i64[Scalar]:i32[50] (id=G98, run=2, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::gather_kernel_2:MPS(buf#652:2):Int[13, 13]:MPS(buf#654:2):Int[13, 13] (id=K1, run=10, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::gather_kernel_2:MPS(buf#652:2):Int[13, 13]:MPS(buf#654:2):Int[13, 13] (id=K1, run=10, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=5, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::add_out_mps::i32[2500,1,4]:f32[1,3,4]:f32[2500,3,4] (id=G99, run=1, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::arange_mps_out:i32[25]:25 (id=G100, run=2, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::mul:i32[25]:i64[Scalar]:i32[25] (id=G101, run=2, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::arange_mps_out:i32[25]:25 (id=G100, run=2, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::mul:i32[25]:i64[Scalar]:i32[25] (id=G101, run=2, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::gather_kernel_2:MPS(buf#652:2):Int[13, 13]:MPS(buf#654:2):Int[13, 13] (id=K1, run=10, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::gather_kernel_2:MPS(buf#652:2):Int[13, 13]:MPS(buf#654:2):Int[13, 13] (id=K1, run=10, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=5, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::add_out_mps::i32[625,1,4]:f32[1,3,4]:f32[625,3,4] (id=G102, run=1, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::arange_mps_out:i32[13]:13 (id=G103, run=2, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::mul:i32[13]:i64[Scalar]:i32[13] (id=G104, run=2, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::arange_mps_out:i32[13]:13 (id=G103, run=2, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::mul:i32[13]:i64[Scalar]:i32[13] (id=G104, run=2, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::gather_kernel_2:MPS(buf#652:2):Int[13, 13]:MPS(buf#654:2):Int[13, 13] (id=K1, run=10, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::gather_kernel_2:MPS(buf#652:2):Int[13, 13]:MPS(buf#654:2):Int[13, 13] (id=K1, run=10, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=5, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::add_out_mps::i32[169,1,4]:f32[1,3,4]:f32[169,3,4] (id=G105, run=1, gpu=7.081 ms, cpu=0.046 ms) | |
| aten::cat_out_mps:0:NCHW:f32:5 (id=G106, run=1, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::cat_out_mps:1:NCHW:f32:5 (id=G107, run=2, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::cat_out_mps:1:NCHW:f32:5 (id=G107, run=2, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=1, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::sub_out_mps::f32[159882]:f32[159882]:f32[159882] (id=G109, run=2, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::sub_out_mps::f32[159882]:f32[159882]:f32[159882] (id=G109, run=2, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::mul:f32[159882]:f32[Scalar]:f32[159882] (id=G110, run=2, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::add_out_mps::f32[159882]:f32[159882]:f32[159882] (id=G111, run=2, gpu=13.525 ms, cpu=0.052 ms) | |
| aten::mul:f32[159882]:f32[Scalar]:f32[159882] (id=G110, run=2, gpu=0.342 ms, cpu=0.280 ms) | |
| aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=0.342 ms, cpu=0.280 ms) | |
| aten::add_out_mps::f32[159882]:f32[159882]:f32[159882] (id=G111, run=2, gpu=0.342 ms, cpu=0.280 ms) | |
| aten::gather_kernel_2:MPS(buf#667:3):Float[159882, 1]:MPS(buf#675:1):Float[159882, 1] (id=K4, run=4, gpu=0.342 ms, cpu=0.280 ms) | |
| aten::div_out_mps::f32[159882,1]:f32[Scalar]:f32[159882,1] (id=G112, run=4, gpu=0.342 ms, cpu=0.280 ms) | |
| aten::gather_kernel_2:MPS(buf#667:3):Float[159882, 1]:MPS(buf#675:1):Float[159882, 1] (id=K4, run=4, gpu=0.342 ms, cpu=0.280 ms) | |
| aten::div_out_mps::f32[159882,1]:f32[Scalar]:f32[159882,1] (id=G112, run=4, gpu=0.342 ms, cpu=0.280 ms) | |
| aten::gather_kernel_2:MPS(buf#667:3):Float[159882, 1]:MPS(buf#675:1):Float[159882, 1] (id=K4, run=4, gpu=0.342 ms, cpu=0.280 ms) | |
| aten::div_out_mps::f32[159882,1]:f32[Scalar]:f32[159882,1] (id=G112, run=4, gpu=0.342 ms, cpu=0.280 ms) | |
| aten::gather_kernel_2:MPS(buf#667:3):Float[159882, 1]:MPS(buf#675:1):Float[159882, 1] (id=K4, run=4, gpu=0.342 ms, cpu=0.280 ms) | |
| aten::div_out_mps::f32[159882,1]:f32[Scalar]:f32[159882,1] (id=G112, run=4, gpu=0.342 ms, cpu=0.280 ms) | |
| aten::clamp_out_mps_max:4.135167_scalar::f32[159882,1] (id=G113, run=2, gpu=0.342 ms, cpu=0.280 ms) | |
| aten::clamp_out_mps_max:4.135167_scalar::f32[159882,1] (id=G113, run=2, gpu=0.989 ms, cpu=0.051 ms) | |
| aten::mul:f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G114, run=4, gpu=0.989 ms, cpu=0.051 ms) | |
| aten::add_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G115, run=2, gpu=0.989 ms, cpu=0.051 ms) | |
| aten::mul:f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G114, run=4, gpu=0.989 ms, cpu=0.051 ms) | |
| aten::add_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G115, run=2, gpu=0.989 ms, cpu=0.051 ms) | |
| aten::exp:MPS(buf#673:2):Float[159882, 1] (id=K5, run=2, gpu=0.989 ms, cpu=0.051 ms) | |
| aten::mul:f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G114, run=4, gpu=0.989 ms, cpu=0.051 ms) | |
| aten::exp:MPS(buf#673:2):Float[159882, 1] (id=K5, run=2, gpu=0.989 ms, cpu=0.051 ms) | |
| aten::mul:f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G114, run=4, gpu=0.989 ms, cpu=0.051 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#628:1):Float[] (len=4 bytes, gpu=0.989 ms, cpu=0.051 ms) | |
| aten::mul:f32[Scalar]:f32[159882,1]:f32[159882,1] (id=G116, run=1, gpu=7.283 ms, cpu=0.053 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#628:2):Float[] (len=4 bytes, gpu=7.283 ms, cpu=0.053 ms) | |
| aten::mul:f32[Scalar]:f32[159882,1]:f32[159882,1] (id=G116, run=2, gpu=3.783 ms, cpu=0.369 ms) | |
| aten::sub_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G117, run=2, gpu=3.783 ms, cpu=0.369 ms) | |
| aten::sub_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G117, run=2, gpu=3.783 ms, cpu=0.369 ms) | |
| aten::add_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G115, run=4, gpu=3.783 ms, cpu=0.369 ms) | |
| aten::add_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G115, run=4, gpu=3.783 ms, cpu=0.369 ms) | |
| aten::cat_out_mps:2:NCHW:f32:4 (id=G118, run=1, gpu=3.783 ms, cpu=0.369 ms) | |
| aten::fill_scalar_mps_impl:i64[30000]:1.000000 (id=G119, run=1, gpu=21.132 ms, cpu=0.192 ms) | |
| aten::fill_scalar_mps_impl:i64[7500]:2.000000 (id=G120, run=1, gpu=21.132 ms, cpu=0.192 ms) | |
| aten::fill_scalar_mps_impl:i64[1875]:3.000000 (id=G121, run=1, gpu=21.132 ms, cpu=0.192 ms) | |
| aten::fill_scalar_mps_impl:i64[507]:4.000000 (id=G122, run=1, gpu=21.132 ms, cpu=0.192 ms) | |
| aten::cat_out_mps:0:NCHW:i64:5 (id=G123, run=1, gpu=21.132 ms, cpu=0.192 ms) | |
| aten::topk:1,120000:Float32:k1000:dim1:largest1 (id=G124, run=1, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::add_out_mps::i64[1,1000]:i64[Scalar]:i64[1,1000] (id=G125, run=4, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::gather_kernel_2:MPS(buf#654:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::topk:1,30000:Float32:k1000:dim1:largest1 (id=G126, run=1, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::add_out_mps::i64[1,1000]:i64[Scalar]:i64[1,1000] (id=G125, run=4, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::gather_kernel_2:MPS(buf#654:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::topk:1,7500:Float32:k1000:dim1:largest1 (id=G127, run=1, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::add_out_mps::i64[1,1000]:i64[Scalar]:i64[1,1000] (id=G125, run=4, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::gather_kernel_2:MPS(buf#654:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::topk:1,1875:Float32:k1000:dim1:largest1 (id=G128, run=1, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::add_out_mps::i64[1,1000]:i64[Scalar]:i64[1,1000] (id=G125, run=4, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::gather_kernel_2:MPS(buf#654:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::topk:1,507:Float32:k507:dim1:largest1 (id=G129, run=1, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::add_out_mps::i64[1,507]:i64[Scalar]:i64[1,507] (id=G130, run=1, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::cat_out_mps:1:NCHW:i64:5 (id=G131, run=1, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::arange_mps_out:i64[1]:1 (id=G132, run=1, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::gather_kernel_2:MPS(buf#628:2):Long[1, 4507]:MPS(buf#619:2):Long[1, 4507] (id=K6, run=2, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#685:1):Float[1, 4507, 4] (id=K7, run=2, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::gather_kernel_2:MPS(buf#628:2):Long[1, 4507]:MPS(buf#619:2):Long[1, 4507] (id=K6, run=2, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#657:1):Long[1, 4507] (id=K8, run=1, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::gather_kernel_3:MPS(buf#628:2):Long[1, 4507, 1]:MPS(buf#619:2):Long[1, 4507, 1] (id=K9, run=1, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#685:1):Float[1, 4507, 4] (id=K7, run=2, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::sigmoid_out_mps:f32[1,4507]:f32[1,4507] (id=G133, run=1, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::gather_kernel_2:MPS(buf#654:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::clamp_out_mps_min:0.000000_max:800.000000_scalar::f32[4507,2] (id=G134, run=2, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::gather_kernel_2:MPS(buf#654:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::clamp_out_mps_min:0.000000_max:800.000000_scalar::f32[4507,2] (id=G134, run=2, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::cat_out_mps:2:NCHW:f32:2 (id=G135, run=1, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::gather_kernel_1:MPS(buf#659:2):Float[4507]:MPS(buf#616:1):Float[4507] (id=K3, run=10, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::gather_kernel_1:MPS(buf#659:2):Float[4507]:MPS(buf#616:1):Float[4507] (id=K3, run=10, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::sub_out_mps::f32[4507]:f32[4507]:f32[4507] (id=G136, run=2, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::gather_kernel_1:MPS(buf#659:2):Float[4507]:MPS(buf#616:1):Float[4507] (id=K3, run=10, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::gather_kernel_1:MPS(buf#659:2):Float[4507]:MPS(buf#616:1):Float[4507] (id=K3, run=10, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::sub_out_mps::f32[4507]:f32[4507]:f32[4507] (id=G136, run=2, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::greaterThanOrEqualTo:f32[4507]:f32[Scalar]:b8[4507] (id=G137, run=2, gpu=61.493 ms, cpu=0.042 ms) | |
| aten::greaterThanOrEqualTo:f32[4507]:f32[Scalar]:b8[4507] (id=G137, run=2, gpu=0.008 ms, cpu=0.052 ms) | |
| aten::bitwise_and_tensor:MPS(buf#622:2):Bool[4507]:MPS(buf#627:2):Bool[4507] (id=K10, run=1, gpu=0.008 ms, cpu=0.052 ms) | |
| aten::count_nonzero_mps:0::b8[4507]:0:7::i64[Scalar]:Bool (id=G138, run=1, gpu=4.699 ms, cpu=0.053 ms) | |
| BlitCopySync: MPS(buf#644:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=4.699 ms, cpu=0.053 ms) | |
| aten::nonzero_out_native_mps:b8[4507] (id=G139, run=1, gpu=9.368 ms, cpu=0.061 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#618:1):Float[4507] (id=K7, run=4, gpu=9.368 ms, cpu=0.061 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#618:1):Float[4507] (id=K7, run=4, gpu=9.368 ms, cpu=0.061 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#615:1):Long[4507] (id=K8, run=2, gpu=9.368 ms, cpu=0.061 ms) | |
| aten::greaterThanOrEqualTo:f32[4507]:f32[Scalar]:b8[4507] (id=G137, run=3, gpu=9.368 ms, cpu=0.061 ms) | |
| aten::count_nonzero_mps:0::b8[4507]:0:7::i64[Scalar]:Bool (id=G138, run=2, gpu=0.826 ms, cpu=0.060 ms) | |
| BlitCopySync: MPS(buf#644:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.826 ms, cpu=0.060 ms) | |
| aten::nonzero_out_native_mps:b8[4507] (id=G139, run=2, gpu=11.543 ms, cpu=0.074 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#655:1):Float[4507] (id=K7, run=6, gpu=11.543 ms, cpu=0.074 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#655:1):Float[4507] (id=K7, run=6, gpu=11.543 ms, cpu=0.074 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#616:1):Long[4507] (id=K8, run=3, gpu=11.543 ms, cpu=0.074 ms) | |
| aten::max_mps:f32[4507,4] (id=G140, run=1, gpu=11.543 ms, cpu=0.074 ms) | |
| aten::copy_cast_mps:i64[[-1]]:f32[[-1]]:0 (id=G141, run=1, gpu=11.543 ms, cpu=0.074 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#642:1):Float[] (len=4 bytes, gpu=11.543 ms, cpu=0.074 ms) | |
| 2024-08-21 03:36:51.754 python[98536:16421626] In index_select_mps | |
| aten::add_out_mps::f32[Scalar]:f32[Scalar]:f32[Scalar] (id=G142, run=1, gpu=33.613 ms, cpu=0.159 ms) | |
| aten::mul:f32[4507]:f32[Scalar]:f32[4507] (id=G143, run=1, gpu=33.613 ms, cpu=0.159 ms) | |
| aten::add_out_mps::f32[4507,4]:f32[4507,1]:f32[4507,4] (id=G144, run=1, gpu=33.613 ms, cpu=0.159 ms) | |
| BlitCopy: MPS(buf#663:1):Float[4507] --> MPS(buf#616:1):Float[4507] (len=17.61 KB, gpu=33.613 ms, cpu=0.159 ms) | |
| aten::sort:4507:Float32:dim0:descending1 (id=G145, run=1, gpu=33.613 ms, cpu=0.159 ms) | |
| aten::index_select_out_mps:f32[4507,4]:i64[4507]:0 (id=G146, run=1, gpu=33.613 ms, cpu=0.159 ms) | |
| aten::nms_float:MPS(buf#619:2):Float[4507, 4]:MPS(buf#663:2):Float[4507] (id=K11, run=1, gpu=33.613 ms, cpu=0.159 ms) | |
| BlitCopySync: MPS(buf#584:2):Long[319997] --> CPU:Long[319997] (len=2.44 MB, gpu=33.613 ms, cpu=0.159 ms) | |
| BlitCopySync: CPU:Long[2004] --> MPS(buf#669:1):Long[2004] (len=15.66 KB, gpu=8.834 ms, cpu=0.048 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#668:1):Long[2004] (id=K8, run=4, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#663:1):Float[1000] (id=K7, run=8, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#663:1):Float[1000] (id=K7, run=8, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=4, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=4, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::cat_out_mps:1:NCHW:f32:2 (id=G147, run=1, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::gather_kernel_1:MPS(buf#619:2):Float[1000]:MPS(buf#661:1):Float[1000] (id=K3, run=14, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::gather_kernel_1:MPS(buf#619:2):Float[1000]:MPS(buf#661:1):Float[1000] (id=K3, run=14, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::sub_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G148, run=2, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::gather_kernel_1:MPS(buf#619:2):Float[1000]:MPS(buf#661:1):Float[1000] (id=K3, run=14, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::gather_kernel_1:MPS(buf#619:2):Float[1000]:MPS(buf#661:1):Float[1000] (id=K3, run=14, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::sub_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G148, run=2, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::mul:f32[1000]:f32[1000]:f32[1000] (id=G149, run=1, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=4, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::sqrt_out_mps:f32[1000]:f32[1000] (id=G150, run=1, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::div_out_mps::f32[1000]:i64[Scalar]:f32[1000] (id=G151, run=1, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::log2_out_mps:f32[1000]:f32[1000] (id=G152, run=1, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::add_out_mps::f32[1000]:i64[Scalar]:f32[1000] (id=G153, run=1, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::add_out_mps::f32[1000]:f32[Scalar]:f32[1000] (id=G154, run=1, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::floor_out_mps:f32[1000]:f32[1000] (id=G155, run=1, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::clamp_out_mps_min:2.000000_max:5.000000_scalar::f32[1000] (id=G156, run=1, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::copy_cast_mps:f32[[-1]]:i64[[-1]]:0 (id=G157, run=1, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::sub_out_mps::i64[1000]:i64[Scalar]:i64[1000] (id=G158, run=1, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::equal:i64[1000]:i64[Scalar]:b8[1000] (id=G159, run=1, gpu=49.727 ms, cpu=42.667 ms) | |
| aten::count_nonzero_mps:0::b8[1000]:0:7::i64[Scalar]:Bool (id=G160, run=1, gpu=12.332 ms, cpu=0.059 ms) | |
| BlitCopySync: MPS(buf#628:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=12.332 ms, cpu=0.059 ms) | |
| Error: command buffer exited with error status. | |
| The Metal Performance Shaders operations encoded on it may not have completed. | |
| Error: | |
| (null) | |
| Internal Error (0000000e:Internal Error) | |
| <AGXG15XFamilyCommandBuffer: 0x3b52830d0> | |
| label = <none> | |
| device = <AGXG15CDevice: 0x112220600> | |
| name = Apple M3 Max | |
| commandQueue = <AGXG15XFamilyCommandQueue: 0x11052e000> | |
| label = <none> | |
| device = <AGXG15CDevice: 0x112220600> | |
| name = Apple M3 Max | |
| retainedReferences = 1 | |
| aten::nonzero_out_native_mps:b8[1000] (id=G161, run=1, gpu=51428.741 ms, cpu=0.063 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#618:1):Float[959, 5] (id=K7, run=9, gpu=51428.741 ms, cpu=0.063 ms) | |
| aten::roi_align_float:MPS(buf#610:1):Float[1, 256, 200, 200]:MPS(buf#660:2):Float[959, 5] (id=K12, run=1, gpu=51428.741 ms, cpu=0.063 ms) | |
| aten::index_put_32bit_idx32:MPS(buf#688:2):Float[959, 256, 7, 7] (id=K13, run=1, gpu=51428.741 ms, cpu=0.063 ms) | |
| aten::equal:i64[1000]:i64[Scalar]:b8[1000] (id=G159, run=2, gpu=51428.741 ms, cpu=0.063 ms) | |
| aten::count_nonzero_mps:0::b8[1000]:0:7::i64[Scalar]:Bool (id=G160, run=2, gpu=4.767 ms, cpu=0.275 ms) | |
| BlitCopySync: MPS(buf#628:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=4.767 ms, cpu=0.275 ms) | |
| aten::nonzero_out_native_mps:b8[1000] (id=G161, run=2, gpu=34762.899 ms, cpu=0.032 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#618:1):Float[959, 5] (id=K7, run=10, gpu=34762.899 ms, cpu=0.032 ms) | |
| aten::roi_align_float:MPS(buf#600:1):Float[1, 256, 100, 100]:MPS(buf#615:2):Float[959, 5] (id=K12, run=2, gpu=34762.899 ms, cpu=0.032 ms) | |
| aten::index_put_32bit_idx32:MPS(buf#690:2):Float[959, 256, 7, 7] (id=K13, run=2, gpu=34762.899 ms, cpu=0.032 ms) | |
| aten::equal:i64[1000]:i64[Scalar]:b8[1000] (id=G159, run=3, gpu=34762.899 ms, cpu=0.032 ms) | |
| aten::count_nonzero_mps:0::b8[1000]:0:7::i64[Scalar]:Bool (id=G160, run=3, gpu=0.037 ms, cpu=0.285 ms) | |
| BlitCopySync: MPS(buf#628:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.037 ms, cpu=0.285 ms) | |
| aten::nonzero_out_native_mps:b8[1000] (id=G161, run=3, gpu=1.197 ms, cpu=0.266 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#618:1):Float[3, 5] (id=K7, run=11, gpu=1.197 ms, cpu=0.266 ms) | |
| aten::roi_align_float:MPS(buf#590:1):Float[1, 256, 50, 50]:MPS(buf#651:2):Float[3, 5] (id=K12, run=3, gpu=1.197 ms, cpu=0.266 ms) | |
| aten::index_put_32bit_idx32:MPS(buf#654:2):Float[3, 256, 7, 7] (id=K13, run=3, gpu=1.197 ms, cpu=0.266 ms) | |
| aten::equal:i64[1000]:i64[Scalar]:b8[1000] (id=G159, run=4, gpu=1.197 ms, cpu=0.266 ms) | |
| aten::count_nonzero_mps:0::b8[1000]:0:7::i64[Scalar]:Bool (id=G160, run=4, gpu=0.035 ms, cpu=0.048 ms) | |
| BlitCopySync: MPS(buf#628:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.035 ms, cpu=0.048 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1000,256,7,7]:f32[256,256,3,3]:0:nobias (id=G162, run=4, gpu=5.307 ms, cpu=0.849 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1000,256,7,7::f32[1000,256,7,7]:f32[256]:f32[256]:f32[256]:f32[256] (id=G163, run=4, gpu=5.307 ms, cpu=0.849 ms) | |
| aten::relu_:f32[1000,256,7,7] (id=G164, run=4, gpu=5.315 ms, cpu=0.041 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1000,256,7,7]:f32[256,256,3,3]:0:nobias (id=G162, run=4, gpu=5.315 ms, cpu=0.041 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1000,256,7,7::f32[1000,256,7,7]:f32[256]:f32[256]:f32[256]:f32[256] (id=G163, run=4, gpu=5.315 ms, cpu=0.041 ms) | |
| aten::relu_:f32[1000,256,7,7] (id=G164, run=4, gpu=5.312 ms, cpu=0.761 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1000,256,7,7]:f32[256,256,3,3]:0:nobias (id=G162, run=4, gpu=5.312 ms, cpu=0.761 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1000,256,7,7::f32[1000,256,7,7]:f32[256]:f32[256]:f32[256]:f32[256] (id=G163, run=4, gpu=5.312 ms, cpu=0.761 ms) | |
| aten::relu_:f32[1000,256,7,7] (id=G164, run=4, gpu=5.322 ms, cpu=0.026 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1000,256,7,7]:f32[256,256,3,3]:0:nobias (id=G162, run=4, gpu=5.322 ms, cpu=0.026 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1000,256,7,7::f32[1000,256,7,7]:f32[256]:f32[256]:f32[256]:f32[256] (id=G163, run=4, gpu=5.322 ms, cpu=0.026 ms) | |
| aten::relu_:f32[1000,256,7,7] (id=G164, run=4, gpu=2.939 ms, cpu=0.894 ms) | |
| aten::mps_linear:f32[1000,12544]:f32[1024,12544]:f32[1024] (id=G165, run=1, gpu=2.939 ms, cpu=0.894 ms) | |
| aten::relu_:f32[1000,1024] (id=G166, run=1, gpu=2.939 ms, cpu=0.894 ms) | |
| aten::mps_linear:f32[1000,1024]:f32[91,1024]:f32[91] (id=G167, run=1, gpu=2.939 ms, cpu=0.894 ms) | |
| aten::mps_linear:f32[1000,1024]:f32[364,1024]:f32[364] (id=G168, run=1, gpu=2.939 ms, cpu=0.894 ms) | |
| aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=5, gpu=2.939 ms, cpu=0.894 ms) | |
| aten::gather_kernel_1:MPS(buf#654:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=2.939 ms, cpu=0.894 ms) | |
| aten::gather_kernel_1:MPS(buf#654:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=2.939 ms, cpu=0.894 ms) | |
| aten::sub_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G148, run=4, gpu=2.939 ms, cpu=0.894 ms) | |
| aten::gather_kernel_1:MPS(buf#654:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=2.939 ms, cpu=0.894 ms) | |
| aten::gather_kernel_1:MPS(buf#654:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=2.939 ms, cpu=0.894 ms) | |
| aten::sub_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G148, run=4, gpu=2.939 ms, cpu=0.894 ms) | |
| aten::mul:f32[1000]:f32[Scalar]:f32[1000] (id=G169, run=2, gpu=0.037 ms, cpu=0.027 ms) | |
| aten::gather_kernel_1:MPS(buf#654:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=0.037 ms, cpu=0.027 ms) | |
| aten::add_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G170, run=2, gpu=0.037 ms, cpu=0.027 ms) | |
| aten::mul:f32[1000]:f32[Scalar]:f32[1000] (id=G169, run=2, gpu=0.037 ms, cpu=0.027 ms) | |
| aten::gather_kernel_1:MPS(buf#654:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=0.037 ms, cpu=0.027 ms) | |
| aten::add_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G170, run=2, gpu=0.037 ms, cpu=0.027 ms) | |
| aten::gather_kernel_2:MPS(buf#657:3):Float[1000, 91]:MPS(buf#669:1):Float[1000, 91] (id=K4, run=14, gpu=0.037 ms, cpu=0.027 ms) | |
| aten::div_out_mps::f32[1000,91]:f32[Scalar]:f32[1000,91] (id=G171, run=4, gpu=0.037 ms, cpu=0.027 ms) | |
| aten::gather_kernel_2:MPS(buf#657:3):Float[1000, 91]:MPS(buf#669:1):Float[1000, 91] (id=K4, run=14, gpu=0.037 ms, cpu=0.027 ms) | |
| aten::div_out_mps::f32[1000,91]:f32[Scalar]:f32[1000,91] (id=G171, run=4, gpu=0.037 ms, cpu=0.027 ms) | |
| aten::gather_kernel_2:MPS(buf#657:3):Float[1000, 91]:MPS(buf#669:1):Float[1000, 91] (id=K4, run=14, gpu=0.037 ms, cpu=0.027 ms) | |
| aten::div_out_mps::f32[1000,91]:f32[Scalar]:f32[1000,91] (id=G171, run=4, gpu=0.037 ms, cpu=0.027 ms) | |
| aten::gather_kernel_2:MPS(buf#657:3):Float[1000, 91]:MPS(buf#669:1):Float[1000, 91] (id=K4, run=14, gpu=0.037 ms, cpu=0.027 ms) | |
| aten::div_out_mps::f32[1000,91]:f32[Scalar]:f32[1000,91] (id=G171, run=4, gpu=0.026 ms, cpu=0.033 ms) | |
| aten::clamp_out_mps_max:4.135167_scalar::f32[1000,91] (id=G172, run=2, gpu=0.026 ms, cpu=0.033 ms) | |
| aten::clamp_out_mps_max:4.135167_scalar::f32[1000,91] (id=G172, run=2, gpu=0.026 ms, cpu=0.033 ms) | |
| aten::mul:f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G173, run=4, gpu=0.026 ms, cpu=0.033 ms) | |
| aten::add_out_mps::f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G174, run=2, gpu=0.026 ms, cpu=0.033 ms) | |
| aten::mul:f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G173, run=4, gpu=0.026 ms, cpu=0.033 ms) | |
| aten::add_out_mps::f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G174, run=2, gpu=0.026 ms, cpu=0.033 ms) | |
| aten::exp:MPS(buf#670:2):Float[1000, 91] (id=K5, run=4, gpu=0.026 ms, cpu=0.033 ms) | |
| aten::mul:f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G173, run=4, gpu=0.006 ms, cpu=0.027 ms) | |
| aten::exp:MPS(buf#670:2):Float[1000, 91] (id=K5, run=4, gpu=0.006 ms, cpu=0.027 ms) | |
| aten::mul:f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G173, run=4, gpu=0.006 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#628:1):Float[] (len=4 bytes, gpu=0.006 ms, cpu=0.027 ms) | |
| aten::mul:f32[Scalar]:f32[1000,91]:f32[1000,91] (id=G175, run=1, gpu=0.014 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#644:1):Float[] (len=4 bytes, gpu=0.014 ms, cpu=0.032 ms) | |
| aten::mul:f32[Scalar]:f32[1000,91]:f32[1000,91] (id=G175, run=2, gpu=0.043 ms, cpu=0.049 ms) | |
| aten::sub_out_mps::f32[1000,91]:f32[1000,91]:f32[1000,91] (id=G176, run=2, gpu=0.043 ms, cpu=0.049 ms) | |
| aten::sub_out_mps::f32[1000,91]:f32[1000,91]:f32[1000,91] (id=G176, run=2, gpu=0.043 ms, cpu=0.049 ms) | |
| aten::add_out_mps::f32[1000,91]:f32[1000,91]:f32[1000,91] (id=G177, run=2, gpu=0.043 ms, cpu=0.049 ms) | |
| aten::add_out_mps::f32[1000,91]:f32[1000,91]:f32[1000,91] (id=G177, run=2, gpu=0.043 ms, cpu=0.049 ms) | |
| aten::cat_out_mps:2:NCHW:f32:4 (id=G118, run=2, gpu=0.043 ms, cpu=0.049 ms) | |
| aten::softmax_mps_out:f32[[-1]]:Contiguous:1 (id=G178, run=1, gpu=0.093 ms, cpu=0.266 ms) | |
| aten::gather_kernel_3:MPS(buf#667:2):Float[1000, 90, 4]:MPS(buf#685:1):Float[1000, 90, 4] (id=K14, run=3, gpu=0.093 ms, cpu=0.266 ms) | |
| aten::clamp_out_mps_min:0.000000_max:800.000000_scalar::f32[1000,91,2] (id=G179, run=2, gpu=0.093 ms, cpu=0.266 ms) | |
| aten::gather_kernel_3:MPS(buf#667:2):Float[1000, 90, 4]:MPS(buf#685:1):Float[1000, 90, 4] (id=K14, run=3, gpu=0.093 ms, cpu=0.266 ms) | |
| aten::clamp_out_mps_min:0.000000_max:800.000000_scalar::f32[1000,91,2] (id=G179, run=2, gpu=0.093 ms, cpu=0.266 ms) | |
| aten::cat_out_mps:3:NCHW:f32:2 (id=G180, run=1, gpu=0.093 ms, cpu=0.266 ms) | |
| aten::arange_mps_out:i64[91]:91 (id=G181, run=1, gpu=0.093 ms, cpu=0.266 ms) | |
| aten::gather_kernel_3:MPS(buf#667:2):Float[1000, 90, 4]:MPS(buf#685:1):Float[1000, 90, 4] (id=K14, run=3, gpu=0.093 ms, cpu=0.266 ms) | |
| aten::gather_kernel_2:MPS(buf#616:2):Float[1000, 90]:MPS(buf#663:1):Float[1000, 90] (id=K4, run=15, gpu=0.093 ms, cpu=0.266 ms) | |
| aten::gather_kernel_2:MPS(buf#654:2):Long[1000, 90]:MPS(buf#700:2):Long[1000, 90] (id=K6, run=3, gpu=0.093 ms, cpu=0.266 ms) | |
| aten::greaterThan:f32[90000]:f32[Scalar]:b8[90000] (id=G182, run=1, gpu=0.093 ms, cpu=0.266 ms) | |
| aten::count_nonzero_mps:0::b8[90000]:0:7::i64[Scalar]:Bool (id=G183, run=1, gpu=0.024 ms, cpu=0.036 ms) | |
| BlitCopySync: MPS(buf#642:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.024 ms, cpu=0.036 ms) | |
| aten::nonzero_out_native_mps:b8[90000] (id=G184, run=1, gpu=0.045 ms, cpu=0.055 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#663:1):Float[262] (id=K7, run=13, gpu=0.045 ms, cpu=0.055 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#663:1):Float[262] (id=K7, run=13, gpu=0.045 ms, cpu=0.055 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#700:1):Long[262] (id=K8, run=5, gpu=0.045 ms, cpu=0.055 ms) | |
| aten::gather_kernel_1:MPS(buf#654:3):Float[262]:MPS(buf#680:1):Float[262] (id=K3, run=24, gpu=0.045 ms, cpu=0.055 ms) | |
| aten::gather_kernel_1:MPS(buf#654:3):Float[262]:MPS(buf#680:1):Float[262] (id=K3, run=24, gpu=0.045 ms, cpu=0.055 ms) | |
| aten::sub_out_mps::f32[262]:f32[262]:f32[262] (id=G185, run=2, gpu=0.015 ms, cpu=0.015 ms) | |
| aten::gather_kernel_1:MPS(buf#654:3):Float[262]:MPS(buf#680:1):Float[262] (id=K3, run=24, gpu=0.015 ms, cpu=0.015 ms) | |
| aten::gather_kernel_1:MPS(buf#654:3):Float[262]:MPS(buf#680:1):Float[262] (id=K3, run=24, gpu=0.015 ms, cpu=0.015 ms) | |
| aten::sub_out_mps::f32[262]:f32[262]:f32[262] (id=G185, run=2, gpu=0.015 ms, cpu=0.015 ms) | |
| aten::greaterThanOrEqualTo:f32[262]:f32[Scalar]:b8[262] (id=G186, run=2, gpu=0.015 ms, cpu=0.015 ms) | |
| aten::greaterThanOrEqualTo:f32[262]:f32[Scalar]:b8[262] (id=G186, run=2, gpu=0.015 ms, cpu=0.015 ms) | |
| aten::bitwise_and_tensor:MPS(buf#670:2):Bool[262]:MPS(buf#680:2):Bool[262] (id=K10, run=2, gpu=0.015 ms, cpu=0.015 ms) | |
| aten::count_nonzero_mps:0::b8[262]:0:7::i64[Scalar]:Bool (id=G187, run=1, gpu=0.042 ms, cpu=0.028 ms) | |
| BlitCopySync: MPS(buf#642:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.042 ms, cpu=0.028 ms) | |
| aten::nonzero_out_native_mps:b8[262] (id=G188, run=1, gpu=0.069 ms, cpu=0.034 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#624:1):Float[262] (id=K7, run=15, gpu=0.069 ms, cpu=0.034 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#624:1):Float[262] (id=K7, run=15, gpu=0.069 ms, cpu=0.034 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#679:1):Long[262] (id=K8, run=6, gpu=0.069 ms, cpu=0.034 ms) | |
| aten::max_mps:f32[262,4] (id=G189, run=1, gpu=0.069 ms, cpu=0.034 ms) | |
| aten::copy_cast_mps:i64[[-1]]:f32[[-1]]:0 (id=G141, run=2, gpu=0.069 ms, cpu=0.034 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#634:1):Float[] (len=4 bytes, gpu=0.069 ms, cpu=0.034 ms) | |
| 2024-08-21 03:38:18.239 python[98536:16421626] In index_select_mps | |
| aten::add_out_mps::f32[Scalar]:f32[Scalar]:f32[Scalar] (id=G142, run=2, gpu=0.597 ms, cpu=0.059 ms) | |
| aten::mul:f32[262]:f32[Scalar]:f32[262] (id=G190, run=1, gpu=0.597 ms, cpu=0.059 ms) | |
| aten::add_out_mps::f32[262,4]:f32[262,1]:f32[262,4] (id=G191, run=1, gpu=0.597 ms, cpu=0.059 ms) | |
| BlitCopy: MPS(buf#669:1):Float[262] --> MPS(buf#678:1):Float[262] (len=1.02 KB, gpu=0.597 ms, cpu=0.059 ms) | |
| aten::sort:262:Float32:dim0:descending1 (id=G192, run=1, gpu=0.597 ms, cpu=0.059 ms) | |
| aten::index_select_out_mps:f32[262,4]:i64[262]:0 (id=G193, run=1, gpu=0.597 ms, cpu=0.059 ms) | |
| aten::nms_float:MPS(buf#679:2):Float[262, 4]:MPS(buf#669:2):Float[262] (id=K11, run=2, gpu=0.597 ms, cpu=0.059 ms) | |
| BlitCopySync: MPS(buf#673:2):Long[1310] --> CPU:Long[1310] (len=10.23 KB, gpu=0.597 ms, cpu=0.059 ms) | |
| BlitCopySync: CPU:Long[158] --> MPS(buf#674:1):Long[158] (len=1.23 KB, gpu=0.003 ms, cpu=0.025 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#680:1):Long[100] (id=K8, run=8, gpu=0.032 ms, cpu=0.038 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#669:1):Float[100] (id=K7, run=17, gpu=0.032 ms, cpu=0.038 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#669:1):Float[100] (id=K7, run=17, gpu=0.032 ms, cpu=0.038 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#680:1):Long[100] (id=K8, run=8, gpu=0.032 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#642:1):Float[] (len=4 bytes, gpu=0.032 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#634:1):Float[] (len=4 bytes, gpu=0.002 ms, cpu=0.027 ms) | |
| aten::div_out_mps::f32[Scalar]:f32[Scalar]:f32[Scalar] (id=G194, run=1, gpu=0.017 ms, cpu=0.046 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#642:2):Float[] (len=4 bytes, gpu=0.017 ms, cpu=0.046 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#634:1):Float[] (len=4 bytes, gpu=0.003 ms, cpu=0.038 ms) | |
| -------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ -------------------------------------------------------------------------------- | |
| Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls Input Shapes | |
| -------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ -------------------------------------------------------------------------------- | |
| model_inference 0.01% 9.563ms 100.00% 87.372s 87.372s 1 [] | |
| aten::where 0.00% 9.667us 98.76% 86.284s 21.571s 4 [[1000]] | |
| aten::nonzero_numpy 0.00% 24.749us 98.76% 86.284s 21.571s 4 [[1000]] | |
| aten::nonzero 98.73% 86.261s 98.76% 86.284s 21.571s 4 [[1000]] | |
| aten::upsample_nearest2d 0.21% 179.560ms 0.21% 179.560ms 179.560ms 1 [[1, 256, 25, 25], [], []] | |
| aten::where 0.00% 6.709us 0.11% 98.708ms 49.354ms 2 [[4507]] | |
| aten::nonzero_numpy 0.00% 11.875us 0.11% 98.702ms 49.351ms 2 [[4507]] | |
| aten::nonzero 0.10% 86.700ms 0.11% 98.658ms 49.329ms 2 [[4507]] | |
| aten::to 0.00% 7.333us 0.07% 56.852ms 11.370ms 5 [[3, 4], [], [], [], [], []] | |
| aten::_to_copy 0.00% 26.624us 0.07% 56.844ms 11.369ms 5 [[3, 4], [], [], [], [], [], []] | |
| -------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ -------------------------------------------------------------------------------- | |
| Self CPU time total: 87.372s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment