Created
August 21, 2024 01:38
-
-
Save hvaara/5e3b4eed12cd51dcfca97a76f558fd76 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| High watermark memory allocation limit: 163.20 GB | |
| Low watermark memory allocation limit: 134.40 GB | |
| Initializing private heap allocator on unified device memory of size 96.00 GB | |
| BlitCopySync: CPU:Float[3, 224, 224] --> MPS(buf#1:1):Float[3, 224, 224] (len=588.00 KB, gpu=9.644 ms, cpu=4.767 ms) | |
| BlitCopySync: CPU:Float[64, 3, 7, 7] --> MPS(buf#2:1):Float[64, 3, 7, 7] (len=36.75 KB, gpu=1.555 ms, cpu=0.043 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#3:1):Float[64] (len=256 bytes, gpu=1.491 ms, cpu=0.031 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#4:1):Float[64] (len=256 bytes, gpu=0.586 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#5:1):Float[64] (len=256 bytes, gpu=0.564 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#6:1):Float[64] (len=256 bytes, gpu=0.518 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#7:1):Long[] (len=8 bytes, gpu=0.409 ms, cpu=0.232 ms) | |
| BlitCopySync: CPU:Float[64, 64, 1, 1] --> MPS(buf#8:1):Float[64, 64, 1, 1] (len=16.00 KB, gpu=0.708 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#9:1):Float[64] (len=256 bytes, gpu=0.707 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#10:1):Float[64] (len=256 bytes, gpu=2.733 ms, cpu=0.015 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#11:1):Float[64] (len=256 bytes, gpu=0.589 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#12:1):Float[64] (len=256 bytes, gpu=0.783 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#13:1):Long[] (len=8 bytes, gpu=0.694 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[64, 64, 3, 3] --> MPS(buf#14:1):Float[64, 64, 3, 3] (len=144.00 KB, gpu=0.642 ms, cpu=0.034 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#15:1):Float[64] (len=256 bytes, gpu=0.588 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#16:1):Float[64] (len=256 bytes, gpu=7.791 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#17:1):Float[64] (len=256 bytes, gpu=7.531 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#18:1):Float[64] (len=256 bytes, gpu=9.208 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#19:1):Long[] (len=8 bytes, gpu=0.557 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256, 64, 1, 1] --> MPS(buf#20:1):Float[256, 64, 1, 1] (len=64.00 KB, gpu=0.542 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#21:1):Float[256] (len=1024 bytes, gpu=0.579 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#22:1):Float[256] (len=1024 bytes, gpu=0.683 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#23:1):Float[256] (len=1024 bytes, gpu=0.683 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#24:1):Float[256] (len=1024 bytes, gpu=0.701 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#25:1):Long[] (len=8 bytes, gpu=7.410 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256, 64, 1, 1] --> MPS(buf#26:1):Float[256, 64, 1, 1] (len=64.00 KB, gpu=9.523 ms, cpu=0.062 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#27:1):Float[256] (len=1024 bytes, gpu=0.515 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#28:1):Float[256] (len=1024 bytes, gpu=0.569 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#29:1):Float[256] (len=1024 bytes, gpu=0.575 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#30:1):Float[256] (len=1024 bytes, gpu=0.710 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#31:1):Long[] (len=8 bytes, gpu=0.740 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[64, 256, 1, 1] --> MPS(buf#32:1):Float[64, 256, 1, 1] (len=64.00 KB, gpu=0.661 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#33:1):Float[64] (len=256 bytes, gpu=7.733 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#34:1):Float[64] (len=256 bytes, gpu=9.479 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#35:1):Float[64] (len=256 bytes, gpu=0.483 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#36:1):Float[64] (len=256 bytes, gpu=0.762 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#37:1):Long[] (len=8 bytes, gpu=0.759 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[64, 64, 3, 3] --> MPS(buf#38:1):Float[64, 64, 3, 3] (len=144.00 KB, gpu=0.510 ms, cpu=0.035 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#39:1):Float[64] (len=256 bytes, gpu=0.758 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#40:1):Float[64] (len=256 bytes, gpu=0.747 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#41:1):Float[64] (len=256 bytes, gpu=7.758 ms, cpu=0.031 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#42:1):Float[64] (len=256 bytes, gpu=9.569 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#43:1):Long[] (len=8 bytes, gpu=0.325 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Float[256, 64, 1, 1] --> MPS(buf#44:1):Float[256, 64, 1, 1] (len=64.00 KB, gpu=0.699 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#45:1):Float[256] (len=1024 bytes, gpu=0.691 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#46:1):Float[256] (len=1024 bytes, gpu=0.720 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#47:1):Float[256] (len=1024 bytes, gpu=0.675 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#48:1):Float[256] (len=1024 bytes, gpu=0.586 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#49:1):Long[] (len=8 bytes, gpu=1.811 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[64, 256, 1, 1] --> MPS(buf#50:1):Float[64, 256, 1, 1] (len=64.00 KB, gpu=0.495 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#51:1):Float[64] (len=256 bytes, gpu=0.657 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#52:1):Float[64] (len=256 bytes, gpu=0.733 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#53:1):Float[64] (len=256 bytes, gpu=0.703 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#54:1):Float[64] (len=256 bytes, gpu=0.731 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#55:1):Long[] (len=8 bytes, gpu=0.597 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[64, 64, 3, 3] --> MPS(buf#56:1):Float[64, 64, 3, 3] (len=144.00 KB, gpu=9.403 ms, cpu=0.084 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#57:1):Float[64] (len=256 bytes, gpu=0.335 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#58:1):Float[64] (len=256 bytes, gpu=0.525 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#59:1):Float[64] (len=256 bytes, gpu=0.689 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[64] --> MPS(buf#60:1):Float[64] (len=256 bytes, gpu=0.701 ms, cpu=0.015 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#61:1):Long[] (len=8 bytes, gpu=0.745 ms, cpu=0.015 ms) | |
| BlitCopySync: CPU:Float[256, 64, 1, 1] --> MPS(buf#62:1):Float[256, 64, 1, 1] (len=64.00 KB, gpu=0.694 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#63:1):Float[256] (len=1024 bytes, gpu=1.721 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#64:1):Float[256] (len=1024 bytes, gpu=0.381 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#65:1):Float[256] (len=1024 bytes, gpu=0.774 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#66:1):Float[256] (len=1024 bytes, gpu=0.657 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#67:1):Long[] (len=8 bytes, gpu=0.704 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[128, 256, 1, 1] --> MPS(buf#68:1):Float[128, 256, 1, 1] (len=128.00 KB, gpu=0.696 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#69:1):Float[128] (len=512 bytes, gpu=0.732 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#70:1):Float[128] (len=512 bytes, gpu=2.612 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#71:1):Float[128] (len=512 bytes, gpu=0.543 ms, cpu=0.014 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#72:1):Float[128] (len=512 bytes, gpu=0.737 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#73:1):Long[] (len=8 bytes, gpu=0.722 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[128, 128, 3, 3] --> MPS(buf#74:1):Float[128, 128, 3, 3] (len=576.00 KB, gpu=0.664 ms, cpu=0.046 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#75:1):Float[128] (len=512 bytes, gpu=0.609 ms, cpu=0.037 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#76:1):Float[128] (len=512 bytes, gpu=7.534 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#77:1):Float[128] (len=512 bytes, gpu=7.568 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#78:1):Float[128] (len=512 bytes, gpu=2.644 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#79:1):Long[] (len=8 bytes, gpu=0.559 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[512, 128, 1, 1] --> MPS(buf#80:1):Float[512, 128, 1, 1] (len=256.00 KB, gpu=0.738 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#81:1):Float[512] (len=2.00 KB, gpu=0.720 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#82:1):Float[512] (len=2.00 KB, gpu=0.706 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#83:1):Float[512] (len=2.00 KB, gpu=0.748 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#84:1):Float[512] (len=2.00 KB, gpu=7.563 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#85:1):Long[] (len=8 bytes, gpu=1.499 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[512, 256, 1, 1] --> MPS(buf#86:1):Float[512, 256, 1, 1] (len=512.00 KB, gpu=0.536 ms, cpu=0.050 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#87:1):Float[512] (len=2.00 KB, gpu=0.456 ms, cpu=0.101 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#88:1):Float[512] (len=2.00 KB, gpu=0.618 ms, cpu=0.046 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#89:1):Float[512] (len=2.00 KB, gpu=0.678 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#90:1):Float[512] (len=2.00 KB, gpu=0.654 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#91:1):Long[] (len=8 bytes, gpu=0.725 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[128, 512, 1, 1] --> MPS(buf#92:1):Float[128, 512, 1, 1] (len=256.00 KB, gpu=9.616 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#93:1):Float[128] (len=512 bytes, gpu=0.320 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#94:1):Float[128] (len=512 bytes, gpu=0.592 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#95:1):Float[128] (len=512 bytes, gpu=0.765 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#96:1):Float[128] (len=512 bytes, gpu=0.700 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#97:1):Long[] (len=8 bytes, gpu=0.755 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[128, 128, 3, 3] --> MPS(buf#98:1):Float[128, 128, 3, 3] (len=576.00 KB, gpu=0.681 ms, cpu=0.045 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#99:1):Float[128] (len=512 bytes, gpu=6.747 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#100:1):Float[128] (len=512 bytes, gpu=0.062 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#101:1):Float[128] (len=512 bytes, gpu=9.700 ms, cpu=0.041 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#102:1):Float[128] (len=512 bytes, gpu=0.142 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#103:1):Long[] (len=8 bytes, gpu=0.483 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[512, 128, 1, 1] --> MPS(buf#104:1):Float[512, 128, 1, 1] (len=256.00 KB, gpu=0.464 ms, cpu=0.033 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#105:1):Float[512] (len=2.00 KB, gpu=0.622 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#106:1):Float[512] (len=2.00 KB, gpu=0.586 ms, cpu=0.016 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#107:1):Float[512] (len=2.00 KB, gpu=0.560 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#108:1):Float[512] (len=2.00 KB, gpu=7.473 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#109:1):Long[] (len=8 bytes, gpu=4.507 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[128, 512, 1, 1] --> MPS(buf#110:1):Float[128, 512, 1, 1] (len=256.00 KB, gpu=0.550 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#111:1):Float[128] (len=512 bytes, gpu=0.766 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#112:1):Float[128] (len=512 bytes, gpu=0.667 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#113:1):Float[128] (len=512 bytes, gpu=1.545 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#114:1):Float[128] (len=512 bytes, gpu=0.583 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#115:1):Long[] (len=8 bytes, gpu=0.711 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[128, 128, 3, 3] --> MPS(buf#116:1):Float[128, 128, 3, 3] (len=576.00 KB, gpu=0.658 ms, cpu=0.049 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#117:1):Float[128] (len=512 bytes, gpu=0.684 ms, cpu=0.035 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#118:1):Float[128] (len=512 bytes, gpu=0.748 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#119:1):Float[128] (len=512 bytes, gpu=0.742 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#120:1):Float[128] (len=512 bytes, gpu=7.722 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#121:1):Long[] (len=8 bytes, gpu=9.662 ms, cpu=0.055 ms) | |
| BlitCopySync: CPU:Float[512, 128, 1, 1] --> MPS(buf#122:1):Float[512, 128, 1, 1] (len=256.00 KB, gpu=0.424 ms, cpu=0.043 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#123:1):Float[512] (len=2.00 KB, gpu=0.586 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#124:1):Float[512] (len=2.00 KB, gpu=0.567 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#125:1):Float[512] (len=2.00 KB, gpu=0.743 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#126:1):Float[512] (len=2.00 KB, gpu=0.742 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#127:1):Long[] (len=8 bytes, gpu=0.765 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[128, 512, 1, 1] --> MPS(buf#128:1):Float[128, 512, 1, 1] (len=256.00 KB, gpu=7.393 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#129:1):Float[128] (len=512 bytes, gpu=5.499 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#130:1):Float[128] (len=512 bytes, gpu=0.450 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#131:1):Float[128] (len=512 bytes, gpu=0.715 ms, cpu=0.033 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#132:1):Float[128] (len=512 bytes, gpu=7.564 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#133:1):Long[] (len=8 bytes, gpu=7.502 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[128, 128, 3, 3] --> MPS(buf#134:1):Float[128, 128, 3, 3] (len=576.00 KB, gpu=9.713 ms, cpu=0.045 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#135:1):Float[128] (len=512 bytes, gpu=0.501 ms, cpu=0.035 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#136:1):Float[128] (len=512 bytes, gpu=0.689 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#137:1):Float[128] (len=512 bytes, gpu=0.633 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[128] --> MPS(buf#138:1):Float[128] (len=512 bytes, gpu=0.738 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#139:1):Long[] (len=8 bytes, gpu=0.768 ms, cpu=0.033 ms) | |
| BlitCopySync: CPU:Float[512, 128, 1, 1] --> MPS(buf#140:1):Float[512, 128, 1, 1] (len=256.00 KB, gpu=0.710 ms, cpu=0.039 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#141:1):Float[512] (len=2.00 KB, gpu=6.751 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#142:1):Float[512] (len=2.00 KB, gpu=0.502 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#143:1):Float[512] (len=2.00 KB, gpu=2.772 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#144:1):Float[512] (len=2.00 KB, gpu=0.556 ms, cpu=0.044 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#145:1):Long[] (len=8 bytes, gpu=0.456 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256, 512, 1, 1] --> MPS(buf#146:1):Float[256, 512, 1, 1] (len=512.00 KB, gpu=0.697 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#147:1):Float[256] (len=1024 bytes, gpu=0.760 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#148:1):Float[256] (len=1024 bytes, gpu=0.635 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#149:1):Float[256] (len=1024 bytes, gpu=7.723 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#150:1):Float[256] (len=1024 bytes, gpu=7.444 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#151:1):Long[] (len=8 bytes, gpu=9.495 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#152:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.544 ms, cpu=0.864 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#153:1):Float[256] (len=1024 bytes, gpu=0.701 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#154:1):Float[256] (len=1024 bytes, gpu=0.324 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#155:1):Float[256] (len=1024 bytes, gpu=0.684 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#156:1):Float[256] (len=1024 bytes, gpu=0.713 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#157:1):Long[] (len=8 bytes, gpu=7.661 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#158:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=2.548 ms, cpu=0.044 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#159:1):Float[1024] (len=4.00 KB, gpu=0.540 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#160:1):Float[1024] (len=4.00 KB, gpu=0.678 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#161:1):Float[1024] (len=4.00 KB, gpu=0.573 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#162:1):Float[1024] (len=4.00 KB, gpu=0.661 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#163:1):Long[] (len=8 bytes, gpu=0.691 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[1024, 512, 1, 1] --> MPS(buf#164:1):Float[1024, 512, 1, 1] (len=2.00 MB, gpu=6.689 ms, cpu=0.130 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#165:1):Float[1024] (len=4.00 KB, gpu=0.535 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#166:1):Float[1024] (len=4.00 KB, gpu=5.766 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#167:1):Float[1024] (len=4.00 KB, gpu=0.481 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#168:1):Float[1024] (len=4.00 KB, gpu=0.764 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#169:1):Long[] (len=8 bytes, gpu=9.743 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#170:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=0.207 ms, cpu=0.295 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#171:1):Float[256] (len=1024 bytes, gpu=0.537 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#172:1):Float[256] (len=1024 bytes, gpu=0.776 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#173:1):Float[256] (len=1024 bytes, gpu=0.717 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#174:1):Float[256] (len=1024 bytes, gpu=0.395 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#175:1):Long[] (len=8 bytes, gpu=0.535 ms, cpu=0.066 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#176:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=7.597 ms, cpu=0.097 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#177:1):Float[256] (len=1024 bytes, gpu=3.321 ms, cpu=0.072 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#178:1):Float[256] (len=1024 bytes, gpu=0.619 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#179:1):Float[256] (len=1024 bytes, gpu=0.271 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#180:1):Float[256] (len=1024 bytes, gpu=0.704 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#181:1):Long[] (len=8 bytes, gpu=0.642 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#182:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=7.340 ms, cpu=0.054 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#183:1):Float[1024] (len=4.00 KB, gpu=5.340 ms, cpu=0.061 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#184:1):Float[1024] (len=4.00 KB, gpu=0.406 ms, cpu=0.041 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#185:1):Float[1024] (len=4.00 KB, gpu=0.727 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#186:1):Float[1024] (len=4.00 KB, gpu=9.531 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#187:1):Long[] (len=8 bytes, gpu=0.735 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#188:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=0.593 ms, cpu=0.068 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#189:1):Float[256] (len=1024 bytes, gpu=0.364 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#190:1):Float[256] (len=1024 bytes, gpu=0.701 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#191:1):Float[256] (len=1024 bytes, gpu=0.734 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#192:1):Float[256] (len=1024 bytes, gpu=0.750 ms, cpu=0.016 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#193:1):Long[] (len=8 bytes, gpu=7.568 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#194:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=9.503 ms, cpu=0.085 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#195:1):Float[256] (len=1024 bytes, gpu=0.300 ms, cpu=0.037 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#196:1):Float[256] (len=1024 bytes, gpu=0.749 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#197:1):Float[256] (len=1024 bytes, gpu=0.721 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#198:1):Float[256] (len=1024 bytes, gpu=0.705 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#199:1):Long[] (len=8 bytes, gpu=0.731 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#200:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=0.710 ms, cpu=0.054 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#201:1):Float[1024] (len=4.00 KB, gpu=6.519 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#202:1):Float[1024] (len=4.00 KB, gpu=0.375 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#203:1):Float[1024] (len=4.00 KB, gpu=9.774 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#204:1):Float[1024] (len=4.00 KB, gpu=0.420 ms, cpu=0.052 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#205:1):Long[] (len=8 bytes, gpu=0.363 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#206:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=0.563 ms, cpu=0.043 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#207:1):Float[256] (len=1024 bytes, gpu=0.660 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#208:1):Float[256] (len=1024 bytes, gpu=0.738 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#209:1):Float[256] (len=1024 bytes, gpu=0.684 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#210:1):Float[256] (len=1024 bytes, gpu=7.733 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#211:1):Long[] (len=8 bytes, gpu=9.378 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#212:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.389 ms, cpu=0.092 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#213:1):Float[256] (len=1024 bytes, gpu=0.430 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#214:1):Float[256] (len=1024 bytes, gpu=0.364 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#215:1):Float[256] (len=1024 bytes, gpu=0.795 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#216:1):Float[256] (len=1024 bytes, gpu=0.727 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#217:1):Long[] (len=8 bytes, gpu=0.546 ms, cpu=0.016 ms) | |
| BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#218:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=7.690 ms, cpu=0.046 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#219:1):Float[1024] (len=4.00 KB, gpu=5.386 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#220:1):Float[1024] (len=4.00 KB, gpu=0.582 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#221:1):Float[1024] (len=4.00 KB, gpu=0.674 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#222:1):Float[1024] (len=4.00 KB, gpu=7.686 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#223:1):Long[] (len=8 bytes, gpu=7.559 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#224:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=9.520 ms, cpu=0.047 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#225:1):Float[256] (len=1024 bytes, gpu=0.362 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#226:1):Float[256] (len=1024 bytes, gpu=0.779 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#227:1):Float[256] (len=1024 bytes, gpu=0.746 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#228:1):Float[256] (len=1024 bytes, gpu=0.723 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#229:1):Long[] (len=8 bytes, gpu=0.681 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#230:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.419 ms, cpu=0.082 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#231:1):Float[256] (len=1024 bytes, gpu=6.672 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#232:1):Float[256] (len=1024 bytes, gpu=0.484 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#233:1):Float[256] (len=1024 bytes, gpu=9.711 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#234:1):Float[256] (len=1024 bytes, gpu=0.636 ms, cpu=0.062 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#235:1):Long[] (len=8 bytes, gpu=0.536 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#236:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=0.659 ms, cpu=0.058 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#237:1):Float[1024] (len=4.00 KB, gpu=0.685 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#238:1):Float[1024] (len=4.00 KB, gpu=0.611 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#239:1):Float[1024] (len=4.00 KB, gpu=0.653 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#240:1):Float[1024] (len=4.00 KB, gpu=7.686 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#241:1):Long[] (len=8 bytes, gpu=2.623 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#242:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=0.174 ms, cpu=0.284 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#243:1):Float[256] (len=1024 bytes, gpu=0.414 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#244:1):Float[256] (len=1024 bytes, gpu=0.574 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#245:1):Float[256] (len=1024 bytes, gpu=0.569 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#246:1):Float[256] (len=1024 bytes, gpu=0.556 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#247:1):Long[] (len=8 bytes, gpu=6.564 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#248:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.445 ms, cpu=0.102 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#249:1):Float[256] (len=1024 bytes, gpu=7.385 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#250:1):Float[256] (len=1024 bytes, gpu=2.252 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#251:1):Float[256] (len=1024 bytes, gpu=0.703 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#252:1):Float[256] (len=1024 bytes, gpu=0.736 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#253:1):Long[] (len=8 bytes, gpu=0.702 ms, cpu=0.016 ms) | |
| BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#254:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=0.726 ms, cpu=0.049 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#255:1):Float[1024] (len=4.00 KB, gpu=0.440 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#256:1):Float[1024] (len=4.00 KB, gpu=7.702 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#257:1):Float[1024] (len=4.00 KB, gpu=7.617 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#258:1):Float[1024] (len=4.00 KB, gpu=9.519 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#259:1):Long[] (len=8 bytes, gpu=0.256 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[512, 1024, 1, 1] --> MPS(buf#260:1):Float[512, 1024, 1, 1] (len=2.00 MB, gpu=0.566 ms, cpu=0.083 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#261:1):Float[512] (len=2.00 KB, gpu=0.360 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#262:1):Float[512] (len=2.00 KB, gpu=0.760 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#263:1):Float[512] (len=2.00 KB, gpu=0.744 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#264:1):Float[512] (len=2.00 KB, gpu=0.787 ms, cpu=0.015 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#265:1):Long[] (len=8 bytes, gpu=1.759 ms, cpu=0.014 ms) | |
| BlitCopySync: CPU:Float[512, 512, 3, 3] --> MPS(buf#266:1):Float[512, 512, 3, 3] (len=9.00 MB, gpu=0.131 ms, cpu=0.245 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#267:1):Float[512] (len=2.00 KB, gpu=0.685 ms, cpu=0.042 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#268:1):Float[512] (len=2.00 KB, gpu=0.706 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#269:1):Float[512] (len=2.00 KB, gpu=0.760 ms, cpu=0.037 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#270:1):Float[512] (len=2.00 KB, gpu=0.687 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#271:1):Long[] (len=8 bytes, gpu=0.722 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[2048, 512, 1, 1] --> MPS(buf#272:1):Float[2048, 512, 1, 1] (len=4.00 MB, gpu=2.417 ms, cpu=0.131 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#273:1):Float[2048] (len=8.00 KB, gpu=0.383 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#274:1):Float[2048] (len=8.00 KB, gpu=0.715 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#275:1):Float[2048] (len=8.00 KB, gpu=0.698 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#276:1):Float[2048] (len=8.00 KB, gpu=0.684 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#277:1):Long[] (len=8 bytes, gpu=0.746 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[2048, 1024, 1, 1] --> MPS(buf#278:1):Float[2048, 1024, 1, 1] (len=8.00 MB, gpu=8.735 ms, cpu=1.013 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#279:1):Float[2048] (len=8.00 KB, gpu=0.440 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#280:1):Float[2048] (len=8.00 KB, gpu=0.414 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#281:1):Float[2048] (len=8.00 KB, gpu=0.685 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#282:1):Float[2048] (len=8.00 KB, gpu=0.744 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#283:1):Long[] (len=8 bytes, gpu=0.709 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[512, 2048, 1, 1] --> MPS(buf#284:1):Float[512, 2048, 1, 1] (len=4.00 MB, gpu=0.679 ms, cpu=0.114 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#285:1):Float[512] (len=2.00 KB, gpu=7.509 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#286:1):Float[512] (len=2.00 KB, gpu=7.514 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#287:1):Float[512] (len=2.00 KB, gpu=7.542 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#288:1):Float[512] (len=2.00 KB, gpu=9.232 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#289:1):Long[] (len=8 bytes, gpu=0.522 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[512, 512, 3, 3] --> MPS(buf#290:1):Float[512, 512, 3, 3] (len=9.00 MB, gpu=0.372 ms, cpu=0.233 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#291:1):Float[512] (len=2.00 KB, gpu=0.693 ms, cpu=0.039 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#292:1):Float[512] (len=2.00 KB, gpu=0.479 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#293:1):Float[512] (len=2.00 KB, gpu=0.793 ms, cpu=0.016 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#294:1):Float[512] (len=2.00 KB, gpu=0.700 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#295:1):Long[] (len=8 bytes, gpu=7.773 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[2048, 512, 1, 1] --> MPS(buf#296:1):Float[2048, 512, 1, 1] (len=4.00 MB, gpu=2.450 ms, cpu=0.125 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#297:1):Float[2048] (len=8.00 KB, gpu=0.186 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#298:1):Float[2048] (len=8.00 KB, gpu=0.576 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#299:1):Float[2048] (len=8.00 KB, gpu=0.792 ms, cpu=0.016 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#300:1):Float[2048] (len=8.00 KB, gpu=0.740 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#301:1):Long[] (len=8 bytes, gpu=0.696 ms, cpu=0.016 ms) | |
| BlitCopySync: CPU:Float[512, 2048, 1, 1] --> MPS(buf#302:1):Float[512, 2048, 1, 1] (len=4.00 MB, gpu=6.677 ms, cpu=0.112 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#303:1):Float[512] (len=2.00 KB, gpu=0.359 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#304:1):Float[512] (len=2.00 KB, gpu=1.798 ms, cpu=0.015 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#305:1):Float[512] (len=2.00 KB, gpu=0.400 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#306:1):Float[512] (len=2.00 KB, gpu=0.694 ms, cpu=0.016 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#307:1):Long[] (len=8 bytes, gpu=0.735 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[512, 512, 3, 3] --> MPS(buf#308:1):Float[512, 512, 3, 3] (len=9.00 MB, gpu=0.604 ms, cpu=1.043 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#309:1):Float[512] (len=2.00 KB, gpu=0.678 ms, cpu=0.059 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#310:1):Float[512] (len=2.00 KB, gpu=9.797 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#311:1):Float[512] (len=2.00 KB, gpu=0.537 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Float[512] --> MPS(buf#312:1):Float[512] (len=2.00 KB, gpu=0.791 ms, cpu=0.017 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#313:1):Long[] (len=8 bytes, gpu=0.695 ms, cpu=0.016 ms) | |
| BlitCopySync: CPU:Float[2048, 512, 1, 1] --> MPS(buf#314:1):Float[2048, 512, 1, 1] (len=4.00 MB, gpu=0.497 ms, cpu=0.102 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#315:1):Float[2048] (len=8.00 KB, gpu=0.424 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#316:1):Float[2048] (len=8.00 KB, gpu=0.806 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#317:1):Float[2048] (len=8.00 KB, gpu=7.788 ms, cpu=0.016 ms) | |
| BlitCopySync: CPU:Float[2048] --> MPS(buf#318:1):Float[2048] (len=8.00 KB, gpu=9.570 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#319:1):Long[] (len=8 bytes, gpu=0.403 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[256, 256, 1, 1] --> MPS(buf#320:1):Float[256, 256, 1, 1] (len=256.00 KB, gpu=0.676 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#321:1):Float[256] (len=1024 bytes, gpu=0.463 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#322:1):Float[256] (len=1024 bytes, gpu=0.571 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#323:1):Float[256] (len=1024 bytes, gpu=0.568 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#324:1):Float[256] (len=1024 bytes, gpu=0.697 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#325:1):Long[] (len=8 bytes, gpu=6.462 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[256, 512, 1, 1] --> MPS(buf#326:1):Float[256, 512, 1, 1] (len=512.00 KB, gpu=0.495 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#327:1):Float[256] (len=1024 bytes, gpu=4.092 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#328:1):Float[256] (len=1024 bytes, gpu=0.593 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#329:1):Float[256] (len=1024 bytes, gpu=0.636 ms, cpu=0.033 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#330:1):Float[256] (len=1024 bytes, gpu=0.651 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#331:1):Long[] (len=8 bytes, gpu=7.767 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#332:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=7.486 ms, cpu=0.065 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#333:1):Float[256] (len=1024 bytes, gpu=2.360 ms, cpu=0.044 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#334:1):Float[256] (len=1024 bytes, gpu=0.492 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#335:1):Float[256] (len=1024 bytes, gpu=0.777 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#336:1):Float[256] (len=1024 bytes, gpu=0.742 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#337:1):Long[] (len=8 bytes, gpu=0.749 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Float[256, 2048, 1, 1] --> MPS(buf#338:1):Float[256, 2048, 1, 1] (len=2.00 MB, gpu=0.429 ms, cpu=0.098 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#339:1):Float[256] (len=1024 bytes, gpu=7.706 ms, cpu=0.034 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#340:1):Float[256] (len=1024 bytes, gpu=7.543 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#341:1):Float[256] (len=1024 bytes, gpu=9.501 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#342:1):Float[256] (len=1024 bytes, gpu=0.381 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#343:1):Long[] (len=8 bytes, gpu=0.682 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#344:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.638 ms, cpu=0.110 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#345:1):Float[256] (len=1024 bytes, gpu=0.588 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#346:1):Float[256] (len=1024 bytes, gpu=0.774 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#347:1):Float[256] (len=1024 bytes, gpu=0.741 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#348:1):Float[256] (len=1024 bytes, gpu=1.766 ms, cpu=0.029 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#349:1):Long[] (len=8 bytes, gpu=0.545 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#350:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.685 ms, cpu=0.111 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#351:1):Float[256] (len=1024 bytes, gpu=0.621 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#352:1):Float[256] (len=1024 bytes, gpu=0.666 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#353:1):Float[256] (len=1024 bytes, gpu=0.839 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#354:1):Float[256] (len=1024 bytes, gpu=0.469 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#355:1):Long[] (len=8 bytes, gpu=2.760 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#356:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.387 ms, cpu=0.096 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#357:1):Float[256] (len=1024 bytes, gpu=0.433 ms, cpu=0.083 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#358:1):Float[256] (len=1024 bytes, gpu=0.712 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#359:1):Float[256] (len=1024 bytes, gpu=0.662 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#360:1):Float[256] (len=1024 bytes, gpu=0.774 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#361:1):Long[] (len=8 bytes, gpu=1.598 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#362:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.399 ms, cpu=0.131 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#363:1):Float[256] (len=1024 bytes, gpu=0.397 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#364:1):Float[256] (len=1024 bytes, gpu=0.757 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#365:1):Float[256] (len=1024 bytes, gpu=0.730 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#366:1):Float[256] (len=1024 bytes, gpu=0.702 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#367:1):Long[] (len=8 bytes, gpu=0.715 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#368:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=1.639 ms, cpu=0.123 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#369:1):Float[256] (len=1024 bytes, gpu=0.225 ms, cpu=0.023 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#370:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.664 ms, cpu=0.110 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#371:1):Float[256] (len=1024 bytes, gpu=0.629 ms, cpu=0.037 ms) | |
| BlitCopySync: CPU:Float[3, 256, 1, 1] --> MPS(buf#372:1):Float[3, 256, 1, 1] (len=3.00 KB, gpu=0.738 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[3] --> MPS(buf#373:1):Float[3] (len=12 bytes, gpu=0.644 ms, cpu=0.018 ms) | |
| BlitCopySync: CPU:Float[12, 256, 1, 1] --> MPS(buf#374:1):Float[12, 256, 1, 1] (len=12.00 KB, gpu=0.657 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[12] --> MPS(buf#375:1):Float[12] (len=48 bytes, gpu=7.596 ms, cpu=0.027 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#376:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=7.316 ms, cpu=0.125 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#377:1):Float[256] (len=1024 bytes, gpu=5.053 ms, cpu=0.035 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#378:1):Float[256] (len=1024 bytes, gpu=0.536 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#379:1):Float[256] (len=1024 bytes, gpu=0.690 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#380:1):Float[256] (len=1024 bytes, gpu=9.658 ms, cpu=0.021 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#381:1):Long[] (len=8 bytes, gpu=0.481 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#382:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.463 ms, cpu=0.121 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#383:1):Float[256] (len=1024 bytes, gpu=0.466 ms, cpu=0.022 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#384:1):Float[256] (len=1024 bytes, gpu=0.750 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#385:1):Float[256] (len=1024 bytes, gpu=0.744 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#386:1):Float[256] (len=1024 bytes, gpu=0.776 ms, cpu=0.024 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#387:1):Long[] (len=8 bytes, gpu=7.653 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#388:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=8.365 ms, cpu=0.967 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#389:1):Float[256] (len=1024 bytes, gpu=0.820 ms, cpu=0.045 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#390:1):Float[256] (len=1024 bytes, gpu=0.457 ms, cpu=0.031 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#391:1):Float[256] (len=1024 bytes, gpu=0.750 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#392:1):Float[256] (len=1024 bytes, gpu=0.725 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#393:1):Long[] (len=8 bytes, gpu=0.639 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#394:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=6.672 ms, cpu=0.120 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#395:1):Float[256] (len=1024 bytes, gpu=0.069 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#396:1):Float[256] (len=1024 bytes, gpu=5.768 ms, cpu=0.025 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#397:1):Float[256] (len=1024 bytes, gpu=0.433 ms, cpu=0.143 ms) | |
| BlitCopySync: CPU:Float[256] --> MPS(buf#398:1):Float[256] (len=1024 bytes, gpu=0.655 ms, cpu=0.019 ms) | |
| BlitCopySync: CPU:Long[] --> MPS(buf#399:1):Long[] (len=8 bytes, gpu=7.619 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[1024, 12544] --> MPS(buf#400:1):Float[1024, 12544] (len=49.00 MB, gpu=1.695 ms, cpu=41.862 ms) | |
| BlitCopySync: CPU:Float[1024] --> MPS(buf#401:1):Float[1024] (len=4.00 KB, gpu=0.063 ms, cpu=0.175 ms) | |
| BlitCopySync: CPU:Float[91, 1024] --> MPS(buf#402:1):Float[91, 1024] (len=364.00 KB, gpu=8.687 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[91] --> MPS(buf#403:1):Float[91] (len=364 bytes, gpu=5.159 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[364, 1024] --> MPS(buf#404:1):Float[364, 1024] (len=1.42 MB, gpu=0.518 ms, cpu=0.083 ms) | |
| BlitCopySync: CPU:Float[364] --> MPS(buf#405:1):Float[364] (len=1.42 KB, gpu=0.572 ms, cpu=0.026 ms) | |
| BlitCopySync: CPU:Float[3] --> MPS(buf#406:1):Float[3] (len=12 bytes, gpu=8.286 ms, cpu=0.020 ms) | |
| BlitCopySync: CPU:Float[3] --> MPS(buf#407:1):Float[3] (len=12 bytes, gpu=0.410 ms, cpu=0.034 ms) | |
| aten::sub_out_mps::f32[3,224,224]:f32[3,1,1]:f32[3,224,224] (id=G1, run=1, gpu=6.218 ms, cpu=0.960 ms) | |
| aten::div_out_mps::f32[3,224,224]:f32[3,1,1]:f32[3,224,224] (id=G2, run=1, gpu=6.218 ms, cpu=0.960 ms) | |
| aten::upsample_bilinear:f32[1,3,224,224]:[1.000000,0.000000]:[Undefined] (id=G3, run=1, gpu=12.159 ms, cpu=0.074 ms) | |
| BlitCopy: MPS(buf#410:2):Float[3, 800, 800] --> MPS(buf#411:2):Float[3, 800, 800] (len=7.32 MB, gpu=12.159 ms, cpu=0.074 ms) | |
| aten::mps_convolution:2:2:1:1:3:3:1:Contiguous:f32[1,3,800,800]:f32[64,3,7,7]:0:nobias (id=G4, run=1, gpu=12.159 ms, cpu=0.074 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,400,400::f32[1,64,400,400]:f32[64]:f32[64]:f32[64]:f32[64] (id=G5, run=1, gpu=24.210 ms, cpu=0.056 ms) | |
| aten::relu_:f32[1,64,400,400] (id=G6, run=1, gpu=24.210 ms, cpu=0.056 ms) | |
| aten::max_pool2d:f32[1,64,400,400]:Undefined:Undefined:K[3,3,]:S[2,2,]:P[1,1,]:D[1,1,]:NCHW (id=G7, run=1, gpu=24.210 ms, cpu=0.056 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[64,64,1,1]:0:nobias (id=G8, run=1, gpu=24.210 ms, cpu=0.056 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=29.987 ms, cpu=0.021 ms) | |
| aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=29.987 ms, cpu=0.021 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,64,200,200]:f32[64,64,3,3]:0:nobias (id=G11, run=3, gpu=29.987 ms, cpu=0.021 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=29.987 ms, cpu=0.021 ms) | |
| aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=29.987 ms, cpu=0.021 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[256,64,1,1]:0:nobias (id=G12, run=4, gpu=16.006 ms, cpu=0.057 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=4, gpu=16.006 ms, cpu=0.057 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[256,64,1,1]:0:nobias (id=G12, run=4, gpu=16.006 ms, cpu=0.057 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=4, gpu=15.083 ms, cpu=0.054 ms) | |
| aten::add_out_mps::f32[1,256,200,200]:f32[1,256,200,200]:f32[1,256,200,200] (id=G14, run=3, gpu=15.083 ms, cpu=0.054 ms) | |
| aten::relu_:f32[1,256,200,200] (id=G15, run=3, gpu=15.083 ms, cpu=0.054 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[64,256,1,1]:0:nobias (id=G16, run=2, gpu=15.083 ms, cpu=0.054 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=15.083 ms, cpu=0.054 ms) | |
| aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=3.389 ms, cpu=0.032 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,64,200,200]:f32[64,64,3,3]:0:nobias (id=G11, run=3, gpu=3.389 ms, cpu=0.032 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=3.389 ms, cpu=0.032 ms) | |
| aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=3.389 ms, cpu=0.032 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[256,64,1,1]:0:nobias (id=G12, run=4, gpu=3.389 ms, cpu=0.032 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=4, gpu=22.587 ms, cpu=0.031 ms) | |
| aten::add_out_mps::f32[1,256,200,200]:f32[1,256,200,200]:f32[1,256,200,200] (id=G14, run=3, gpu=22.587 ms, cpu=0.031 ms) | |
| aten::relu_:f32[1,256,200,200] (id=G15, run=3, gpu=14.885 ms, cpu=0.025 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[64,256,1,1]:0:nobias (id=G16, run=2, gpu=14.885 ms, cpu=0.025 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=14.885 ms, cpu=0.025 ms) | |
| aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=14.885 ms, cpu=0.025 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,64,200,200]:f32[64,64,3,3]:0:nobias (id=G11, run=3, gpu=14.885 ms, cpu=0.025 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=15.096 ms, cpu=0.032 ms) | |
| aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=15.096 ms, cpu=0.032 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[256,64,1,1]:0:nobias (id=G12, run=4, gpu=15.096 ms, cpu=0.032 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=4, gpu=15.096 ms, cpu=0.032 ms) | |
| aten::add_out_mps::f32[1,256,200,200]:f32[1,256,200,200]:f32[1,256,200,200] (id=G14, run=3, gpu=17.229 ms, cpu=0.111 ms) | |
| aten::relu_:f32[1,256,200,200] (id=G15, run=3, gpu=17.229 ms, cpu=0.111 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[128,256,1,1]:0:nobias (id=G17, run=1, gpu=17.229 ms, cpu=0.111 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,200,200::f32[1,128,200,200]:f32[128]:f32[128]:f32[128]:f32[128] (id=G18, run=1, gpu=17.229 ms, cpu=0.111 ms) | |
| aten::relu_:f32[1,128,200,200] (id=G19, run=1, gpu=17.229 ms, cpu=0.111 ms) | |
| aten::mps_convolution:2:2:1:1:1:1:1:Contiguous:f32[1,128,200,200]:f32[128,128,3,3]:0:nobias (id=G20, run=1, gpu=1.688 ms, cpu=0.019 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=1.688 ms, cpu=0.019 ms) | |
| aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=1.688 ms, cpu=0.019 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,128,100,100]:f32[512,128,1,1]:0:nobias (id=G23, run=4, gpu=1.688 ms, cpu=0.019 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=29.196 ms, cpu=0.103 ms) | |
| aten::mps_convolution:2:2:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[512,256,1,1]:0:nobias (id=G25, run=1, gpu=29.196 ms, cpu=0.103 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=29.196 ms, cpu=0.103 ms) | |
| aten::add_out_mps::f32[1,512,100,100]:f32[1,512,100,100]:f32[1,512,100,100] (id=G26, run=4, gpu=29.196 ms, cpu=0.103 ms) | |
| aten::relu_:f32[1,512,100,100] (id=G27, run=4, gpu=29.196 ms, cpu=0.103 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[128,512,1,1]:0:nobias (id=G28, run=3, gpu=29.196 ms, cpu=0.103 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=29.196 ms, cpu=0.103 ms) | |
| aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=29.196 ms, cpu=0.103 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,128,100,100]:f32[128,128,3,3]:0:nobias (id=G29, run=3, gpu=1.820 ms, cpu=0.035 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=1.820 ms, cpu=0.035 ms) | |
| aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=1.820 ms, cpu=0.035 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,128,100,100]:f32[512,128,1,1]:0:nobias (id=G23, run=4, gpu=1.820 ms, cpu=0.035 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=23.996 ms, cpu=0.038 ms) | |
| aten::add_out_mps::f32[1,512,100,100]:f32[1,512,100,100]:f32[1,512,100,100] (id=G26, run=4, gpu=23.996 ms, cpu=0.038 ms) | |
| aten::relu_:f32[1,512,100,100] (id=G27, run=4, gpu=23.996 ms, cpu=0.038 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[128,512,1,1]:0:nobias (id=G28, run=3, gpu=23.996 ms, cpu=0.038 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=23.996 ms, cpu=0.038 ms) | |
| aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=23.996 ms, cpu=0.038 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,128,100,100]:f32[128,128,3,3]:0:nobias (id=G29, run=3, gpu=23.996 ms, cpu=0.038 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=23.996 ms, cpu=0.038 ms) | |
| aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=9.144 ms, cpu=0.025 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,128,100,100]:f32[512,128,1,1]:0:nobias (id=G23, run=4, gpu=9.144 ms, cpu=0.025 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=9.144 ms, cpu=0.025 ms) | |
| aten::add_out_mps::f32[1,512,100,100]:f32[1,512,100,100]:f32[1,512,100,100] (id=G26, run=4, gpu=9.144 ms, cpu=0.025 ms) | |
| aten::relu_:f32[1,512,100,100] (id=G27, run=4, gpu=9.144 ms, cpu=0.025 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[128,512,1,1]:0:nobias (id=G28, run=3, gpu=6.830 ms, cpu=0.028 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=6.830 ms, cpu=0.028 ms) | |
| aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=6.830 ms, cpu=0.028 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,128,100,100]:f32[128,128,3,3]:0:nobias (id=G29, run=3, gpu=6.830 ms, cpu=0.028 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=16.042 ms, cpu=0.065 ms) | |
| aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=16.042 ms, cpu=0.065 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,128,100,100]:f32[512,128,1,1]:0:nobias (id=G23, run=4, gpu=16.042 ms, cpu=0.065 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=16.042 ms, cpu=0.065 ms) | |
| aten::add_out_mps::f32[1,512,100,100]:f32[1,512,100,100]:f32[1,512,100,100] (id=G26, run=4, gpu=16.042 ms, cpu=0.065 ms) | |
| aten::relu_:f32[1,512,100,100] (id=G27, run=4, gpu=16.042 ms, cpu=0.065 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[256,512,1,1]:0:nobias (id=G30, run=1, gpu=16.042 ms, cpu=0.065 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,100,100::f32[1,256,100,100]:f32[256]:f32[256]:f32[256]:f32[256] (id=G31, run=1, gpu=16.042 ms, cpu=0.065 ms) | |
| aten::relu_:f32[1,256,100,100] (id=G32, run=1, gpu=6.952 ms, cpu=0.221 ms) | |
| aten::mps_convolution:2:2:1:1:1:1:1:Contiguous:f32[1,256,100,100]:f32[256,256,3,3]:0:nobias (id=G33, run=1, gpu=6.952 ms, cpu=0.221 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=6.952 ms, cpu=0.221 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=6.952 ms, cpu=0.221 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=6.952 ms, cpu=0.221 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=9.026 ms, cpu=0.035 ms) | |
| aten::mps_convolution:2:2:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[1024,512,1,1]:0:nobias (id=G38, run=1, gpu=9.026 ms, cpu=0.035 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=9.026 ms, cpu=0.035 ms) | |
| aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=9.026 ms, cpu=0.035 ms) | |
| aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=9.026 ms, cpu=0.035 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=9.026 ms, cpu=0.035 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=9.026 ms, cpu=0.035 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=7.971 ms, cpu=0.042 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=7.971 ms, cpu=0.042 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=7.971 ms, cpu=0.042 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=7.971 ms, cpu=0.042 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=7.971 ms, cpu=0.042 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=9.026 ms, cpu=0.037 ms) | |
| aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=9.026 ms, cpu=0.037 ms) | |
| aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=9.026 ms, cpu=0.037 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=9.026 ms, cpu=0.037 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=9.026 ms, cpu=0.037 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=9.026 ms, cpu=0.037 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=9.026 ms, cpu=0.037 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=8.120 ms, cpu=0.028 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=8.120 ms, cpu=0.028 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=8.120 ms, cpu=0.028 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=8.120 ms, cpu=0.028 ms) | |
| aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=8.120 ms, cpu=0.028 ms) | |
| aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=6.851 ms, cpu=0.028 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=6.851 ms, cpu=0.028 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=6.851 ms, cpu=0.028 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=6.851 ms, cpu=0.028 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=6.851 ms, cpu=0.028 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=8.015 ms, cpu=0.033 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=8.015 ms, cpu=0.033 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=8.015 ms, cpu=0.033 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=8.015 ms, cpu=0.033 ms) | |
| aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=8.015 ms, cpu=0.033 ms) | |
| aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=8.015 ms, cpu=0.033 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=8.015 ms, cpu=0.033 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=4.187 ms, cpu=0.028 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=4.187 ms, cpu=0.028 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=4.187 ms, cpu=0.028 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=4.187 ms, cpu=0.028 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=4.187 ms, cpu=0.028 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=3.803 ms, cpu=0.038 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=3.803 ms, cpu=0.038 ms) | |
| aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=3.803 ms, cpu=0.038 ms) | |
| aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=3.803 ms, cpu=0.038 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=3.803 ms, cpu=0.038 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=5.019 ms, cpu=0.024 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=5.019 ms, cpu=0.024 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=5.019 ms, cpu=0.024 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=5.019 ms, cpu=0.024 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=5.019 ms, cpu=0.024 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=5.019 ms, cpu=0.024 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=5.019 ms, cpu=0.024 ms) | |
| aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=3.451 ms, cpu=0.431 ms) | |
| aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=3.451 ms, cpu=0.431 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[512,1024,1,1]:0:nobias (id=G43, run=1, gpu=3.451 ms, cpu=0.431 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,50,50::f32[1,512,50,50]:f32[512]:f32[512]:f32[512]:f32[512] (id=G44, run=1, gpu=3.451 ms, cpu=0.431 ms) | |
| aten::relu_:f32[1,512,50,50] (id=G45, run=1, gpu=3.451 ms, cpu=0.431 ms) | |
| aten::mps_convolution:2:2:1:1:1:1:1:Contiguous:f32[1,512,50,50]:f32[512,512,3,3]:0:nobias (id=G46, run=1, gpu=6.531 ms, cpu=0.052 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=6.531 ms, cpu=0.052 ms) | |
| aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=6.531 ms, cpu=0.052 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,25,25]:f32[2048,512,1,1]:0:nobias (id=G49, run=3, gpu=6.531 ms, cpu=0.052 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,2048,25,25::f32[1,2048,25,25]:f32[2048]:f32[2048]:f32[2048]:f32[2048] (id=G50, run=4, gpu=2.361 ms, cpu=0.048 ms) | |
| aten::mps_convolution:2:2:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[2048,1024,1,1]:0:nobias (id=G51, run=1, gpu=2.361 ms, cpu=0.048 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,2048,25,25::f32[1,2048,25,25]:f32[2048]:f32[2048]:f32[2048]:f32[2048] (id=G50, run=4, gpu=2.361 ms, cpu=0.048 ms) | |
| aten::add_out_mps::f32[1,2048,25,25]:f32[1,2048,25,25]:f32[1,2048,25,25] (id=G52, run=3, gpu=2.361 ms, cpu=0.048 ms) | |
| aten::relu_:f32[1,2048,25,25] (id=G53, run=3, gpu=2.361 ms, cpu=0.048 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,2048,25,25]:f32[512,2048,1,1]:0:nobias (id=G54, run=2, gpu=2.361 ms, cpu=0.048 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=2.361 ms, cpu=0.048 ms) | |
| aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=2.361 ms, cpu=0.048 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,512,25,25]:f32[512,512,3,3]:0:nobias (id=G55, run=2, gpu=5.060 ms, cpu=0.051 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=5.060 ms, cpu=0.051 ms) | |
| aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=5.060 ms, cpu=0.051 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,25,25]:f32[2048,512,1,1]:0:nobias (id=G49, run=3, gpu=5.060 ms, cpu=0.051 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,2048,25,25::f32[1,2048,25,25]:f32[2048]:f32[2048]:f32[2048]:f32[2048] (id=G50, run=4, gpu=2.594 ms, cpu=0.036 ms) | |
| aten::add_out_mps::f32[1,2048,25,25]:f32[1,2048,25,25]:f32[1,2048,25,25] (id=G52, run=3, gpu=2.594 ms, cpu=0.036 ms) | |
| aten::relu_:f32[1,2048,25,25] (id=G53, run=3, gpu=2.594 ms, cpu=0.036 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,2048,25,25]:f32[512,2048,1,1]:0:nobias (id=G54, run=2, gpu=2.594 ms, cpu=0.036 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=2.594 ms, cpu=0.036 ms) | |
| aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=2.594 ms, cpu=0.036 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,512,25,25]:f32[512,512,3,3]:0:nobias (id=G55, run=2, gpu=2.594 ms, cpu=0.036 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=2.594 ms, cpu=0.036 ms) | |
| aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=8.103 ms, cpu=0.061 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,25,25]:f32[2048,512,1,1]:0:nobias (id=G49, run=3, gpu=8.103 ms, cpu=0.061 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,2048,25,25::f32[1,2048,25,25]:f32[2048]:f32[2048]:f32[2048]:f32[2048] (id=G50, run=4, gpu=8.103 ms, cpu=0.061 ms) | |
| aten::add_out_mps::f32[1,2048,25,25]:f32[1,2048,25,25]:f32[1,2048,25,25] (id=G52, run=3, gpu=8.103 ms, cpu=0.061 ms) | |
| aten::relu_:f32[1,2048,25,25] (id=G53, run=3, gpu=8.103 ms, cpu=0.061 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,2048,25,25]:f32[256,2048,1,1]:0:nobias (id=G56, run=1, gpu=5.217 ms, cpu=0.082 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,25,25::f32[1,256,25,25]:f32[256]:f32[256]:f32[256]:f32[256] (id=G57, run=2, gpu=5.217 ms, cpu=0.082 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,25,25]:f32[256,256,3,3]:0:nobias (id=G58, run=1, gpu=5.217 ms, cpu=0.082 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,25,25::f32[1,256,25,25]:f32[256]:f32[256]:f32[256]:f32[256] (id=G57, run=2, gpu=1.692 ms, cpu=0.019 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=1.692 ms, cpu=0.019 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=1.692 ms, cpu=0.019 ms) | |
| aten::upsample_nearest:f32[1,256,25,25]:[1.000000,0.000000]:[Undefined] (id=G59, run=1, gpu=5.205 ms, cpu=0.060 ms) | |
| aten::add_out_mps::f32[1,256,50,50]:f32[1,256,50,50]:f32[1,256,50,50] (id=G60, run=1, gpu=2.894 ms, cpu=0.022 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=6, gpu=2.894 ms, cpu=0.022 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=13, gpu=2.894 ms, cpu=0.022 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[256,512,1,1]:0:nobias (id=G30, run=2, gpu=2.894 ms, cpu=0.022 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,100,100::f32[1,256,100,100]:f32[256]:f32[256]:f32[256]:f32[256] (id=G31, run=2, gpu=0.072 ms, cpu=0.028 ms) | |
| aten::upsample_nearest:f32[1,256,50,50]:[1.000000,0.000000]:[Undefined] (id=G61, run=1, gpu=22.749 ms, cpu=0.025 ms) | |
| aten::add_out_mps::f32[1,256,100,100]:f32[1,256,100,100]:f32[1,256,100,100] (id=G62, run=1, gpu=22.749 ms, cpu=0.025 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,100,100]:f32[256,256,3,3]:0:nobias (id=G63, run=1, gpu=22.749 ms, cpu=0.025 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,100,100::f32[1,256,100,100]:f32[256]:f32[256]:f32[256]:f32[256] (id=G31, run=3, gpu=22.749 ms, cpu=0.025 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[256,256,1,1]:0:nobias (id=G64, run=1, gpu=22.749 ms, cpu=0.025 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=5, gpu=0.029 ms, cpu=0.019 ms) | |
| aten::upsample_nearest:f32[1,256,100,100]:[1.000000,0.000000]:[Undefined] (id=G65, run=1, gpu=33.052 ms, cpu=0.027 ms) | |
| aten::add_out_mps::f32[1,256,200,200]:f32[1,256,200,200]:f32[1,256,200,200] (id=G14, run=4, gpu=33.052 ms, cpu=0.027 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,200,200]:f32[256,256,3,3]:0:nobias (id=G66, run=1, gpu=36.281 ms, cpu=0.253 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=6, gpu=36.281 ms, cpu=0.253 ms) | |
| aten::max_pool2d:f32[1,256,25,25]:Undefined:Undefined:K[1,1,]:S[2,2,]:P[0,0,]:D[1,1,]:NCHW (id=G67, run=1, gpu=36.281 ms, cpu=0.253 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,200,200]:f32[256,256,3,3]:1:256 (id=G68, run=2, gpu=36.281 ms, cpu=0.253 ms) | |
| aten::relu_:f32[1,256,200,200] (id=G15, run=5, gpu=41.522 ms, cpu=0.021 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,200,200]:f32[256,256,3,3]:1:256 (id=G68, run=2, gpu=41.522 ms, cpu=0.021 ms) | |
| aten::relu_:f32[1,256,200,200] (id=G15, run=5, gpu=41.522 ms, cpu=0.021 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[3,256,1,1]:1:3 (id=G69, run=1, gpu=41.522 ms, cpu=0.021 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[12,256,1,1]:1:12 (id=G70, run=1, gpu=41.522 ms, cpu=0.021 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,100,100]:f32[256,256,3,3]:1:256 (id=G71, run=2, gpu=41.522 ms, cpu=0.021 ms) | |
| aten::relu_:f32[1,256,100,100] (id=G32, run=3, gpu=41.522 ms, cpu=0.021 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,100,100]:f32[256,256,3,3]:1:256 (id=G71, run=2, gpu=8.524 ms, cpu=0.031 ms) | |
| aten::relu_:f32[1,256,100,100] (id=G32, run=3, gpu=8.524 ms, cpu=0.031 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,100,100]:f32[3,256,1,1]:1:3 (id=G72, run=1, gpu=8.524 ms, cpu=0.031 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,100,100]:f32[12,256,1,1]:1:12 (id=G73, run=1, gpu=8.524 ms, cpu=0.031 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:1:256 (id=G74, run=2, gpu=8.524 ms, cpu=0.031 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=13, gpu=8.524 ms, cpu=0.031 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:1:256 (id=G74, run=2, gpu=8.524 ms, cpu=0.031 ms) | |
| aten::relu_:f32[1,256,50,50] (id=G35, run=13, gpu=5.209 ms, cpu=0.051 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[3,256,1,1]:1:3 (id=G75, run=1, gpu=5.209 ms, cpu=0.051 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[12,256,1,1]:1:12 (id=G76, run=1, gpu=5.209 ms, cpu=0.051 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,25,25]:f32[256,256,3,3]:1:256 (id=G77, run=2, gpu=5.209 ms, cpu=0.051 ms) | |
| aten::relu_:f32[1,256,25,25] (id=G78, run=2, gpu=5.209 ms, cpu=0.051 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,25,25]:f32[256,256,3,3]:1:256 (id=G77, run=2, gpu=5.209 ms, cpu=0.051 ms) | |
| aten::relu_:f32[1,256,25,25] (id=G78, run=2, gpu=5.209 ms, cpu=0.051 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,25,25]:f32[3,256,1,1]:1:3 (id=G79, run=1, gpu=2.647 ms, cpu=0.022 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,25,25]:f32[12,256,1,1]:1:12 (id=G80, run=1, gpu=2.647 ms, cpu=0.022 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,13,13]:f32[256,256,3,3]:1:256 (id=G81, run=2, gpu=2.647 ms, cpu=0.022 ms) | |
| aten::relu_:f32[1,256,13,13] (id=G82, run=2, gpu=2.647 ms, cpu=0.022 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,13,13]:f32[256,256,3,3]:1:256 (id=G81, run=2, gpu=2.647 ms, cpu=0.022 ms) | |
| aten::relu_:f32[1,256,13,13] (id=G82, run=2, gpu=2.647 ms, cpu=0.022 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,13,13]:f32[3,256,1,1]:1:3 (id=G83, run=1, gpu=2.647 ms, cpu=0.022 ms) | |
| aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,13,13]:f32[12,256,1,1]:1:12 (id=G84, run=1, gpu=63.031 ms, cpu=0.520 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:4.000000 (id=G85, run=2, gpu=63.031 ms, cpu=0.520 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:4.000000 (id=G85, run=2, gpu=63.031 ms, cpu=0.520 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:8.000000 (id=G86, run=2, gpu=63.031 ms, cpu=0.520 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:8.000000 (id=G86, run=2, gpu=63.031 ms, cpu=0.520 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:16.000000 (id=G87, run=2, gpu=63.031 ms, cpu=0.520 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:16.000000 (id=G87, run=2, gpu=63.031 ms, cpu=0.520 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:32.000000 (id=G88, run=2, gpu=63.031 ms, cpu=0.520 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:32.000000 (id=G88, run=2, gpu=63.031 ms, cpu=0.520 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:61.000000 (id=G89, run=2, gpu=63.031 ms, cpu=0.520 ms) | |
| aten::fill_scalar_mps_impl:i64[Scalar]:61.000000 (id=G89, run=2, gpu=63.031 ms, cpu=0.520 ms) | |
| BlitCopySync: CPU:Float[3, 4] --> MPS(buf#407:1):Float[3, 4] (len=48 bytes, gpu=63.031 ms, cpu=0.520 ms) | |
| BlitCopySync: CPU:Float[3, 4] --> MPS(buf#406:1):Float[3, 4] (len=48 bytes, gpu=0.439 ms, cpu=0.038 ms) | |
| BlitCopySync: CPU:Float[3, 4] --> MPS(buf#648:1):Float[3, 4] (len=48 bytes, gpu=1.716 ms, cpu=0.028 ms) | |
| BlitCopySync: CPU:Float[3, 4] --> MPS(buf#649:1):Float[3, 4] (len=48 bytes, gpu=0.464 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Float[3, 4] --> MPS(buf#650:1):Float[3, 4] (len=48 bytes, gpu=0.686 ms, cpu=0.025 ms) | |
| aten::arange_mps_out:i32[200]:200 (id=G90, run=2, gpu=1.004 ms, cpu=0.075 ms) | |
| aten::mul:i32[200]:i64[Scalar]:i32[200] (id=G91, run=2, gpu=1.004 ms, cpu=0.075 ms) | |
| aten::arange_mps_out:i32[200]:200 (id=G90, run=2, gpu=1.004 ms, cpu=0.075 ms) | |
| aten::mul:i32[200]:i64[Scalar]:i32[200] (id=G91, run=2, gpu=1.004 ms, cpu=0.075 ms) | |
| aten::gather_kernel_2:MPS(buf#653:2):Int[200, 200]:MPS(buf#655:1):Int[200, 200] (id=K1, run=2, gpu=1.004 ms, cpu=0.075 ms) | |
| aten::gather_kernel_2:MPS(buf#653:2):Int[200, 200]:MPS(buf#655:1):Int[200, 200] (id=K1, run=2, gpu=1.004 ms, cpu=0.075 ms) | |
| aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=1, gpu=1.004 ms, cpu=0.075 ms) | |
| aten::add_out_mps::i32[40000,1,4]:f32[1,3,4]:f32[40000,3,4] (id=G93, run=1, gpu=1.004 ms, cpu=0.075 ms) | |
| aten::arange_mps_out:i32[100]:100 (id=G94, run=2, gpu=1.668 ms, cpu=0.032 ms) | |
| aten::mul:i32[100]:i64[Scalar]:i32[100] (id=G95, run=2, gpu=1.668 ms, cpu=0.032 ms) | |
| aten::arange_mps_out:i32[100]:100 (id=G94, run=2, gpu=1.668 ms, cpu=0.032 ms) | |
| aten::mul:i32[100]:i64[Scalar]:i32[100] (id=G95, run=2, gpu=1.668 ms, cpu=0.032 ms) | |
| aten::gather_kernel_2:MPS(buf#652:3):Int[50, 50]:MPS(buf#654:2):Int[50, 50] (id=K1, run=6, gpu=1.668 ms, cpu=0.032 ms) | |
| aten::gather_kernel_2:MPS(buf#652:3):Int[50, 50]:MPS(buf#654:2):Int[50, 50] (id=K1, run=6, gpu=1.668 ms, cpu=0.032 ms) | |
| aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=3, gpu=1.668 ms, cpu=0.032 ms) | |
| aten::add_out_mps::i32[10000,1,4]:f32[1,3,4]:f32[10000,3,4] (id=G96, run=1, gpu=1.668 ms, cpu=0.032 ms) | |
| aten::arange_mps_out:i32[50]:50 (id=G97, run=2, gpu=1.668 ms, cpu=0.032 ms) | |
| aten::mul:i32[50]:i64[Scalar]:i32[50] (id=G98, run=2, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::arange_mps_out:i32[50]:50 (id=G97, run=2, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::mul:i32[50]:i64[Scalar]:i32[50] (id=G98, run=2, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::gather_kernel_2:MPS(buf#651:2):Int[13, 13]:MPS(buf#655:2):Int[13, 13] (id=K1, run=10, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::gather_kernel_2:MPS(buf#651:2):Int[13, 13]:MPS(buf#655:2):Int[13, 13] (id=K1, run=10, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=5, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::add_out_mps::i32[2500,1,4]:f32[1,3,4]:f32[2500,3,4] (id=G99, run=1, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::arange_mps_out:i32[25]:25 (id=G100, run=2, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::mul:i32[25]:i64[Scalar]:i32[25] (id=G101, run=2, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::arange_mps_out:i32[25]:25 (id=G100, run=2, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::mul:i32[25]:i64[Scalar]:i32[25] (id=G101, run=2, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::gather_kernel_2:MPS(buf#651:2):Int[13, 13]:MPS(buf#655:2):Int[13, 13] (id=K1, run=10, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::gather_kernel_2:MPS(buf#651:2):Int[13, 13]:MPS(buf#655:2):Int[13, 13] (id=K1, run=10, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=5, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::add_out_mps::i32[625,1,4]:f32[1,3,4]:f32[625,3,4] (id=G102, run=1, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::arange_mps_out:i32[13]:13 (id=G103, run=2, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::mul:i32[13]:i64[Scalar]:i32[13] (id=G104, run=2, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::arange_mps_out:i32[13]:13 (id=G103, run=2, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::mul:i32[13]:i64[Scalar]:i32[13] (id=G104, run=2, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::gather_kernel_2:MPS(buf#651:2):Int[13, 13]:MPS(buf#655:2):Int[13, 13] (id=K1, run=10, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::gather_kernel_2:MPS(buf#651:2):Int[13, 13]:MPS(buf#655:2):Int[13, 13] (id=K1, run=10, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=5, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::add_out_mps::i32[169,1,4]:f32[1,3,4]:f32[169,3,4] (id=G105, run=1, gpu=10.395 ms, cpu=0.052 ms) | |
| aten::cat_out_mps:0:NCHW:f32:5 (id=G106, run=1, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::cat_out_mps:1:NCHW:f32:5 (id=G107, run=2, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::cat_out_mps:1:NCHW:f32:5 (id=G107, run=2, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=1, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::sub_out_mps::f32[159882]:f32[159882]:f32[159882] (id=G109, run=2, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::sub_out_mps::f32[159882]:f32[159882]:f32[159882] (id=G109, run=2, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::mul:f32[159882]:f32[Scalar]:f32[159882] (id=G110, run=2, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::add_out_mps::f32[159882]:f32[159882]:f32[159882] (id=G111, run=2, gpu=14.793 ms, cpu=0.037 ms) | |
| aten::mul:f32[159882]:f32[Scalar]:f32[159882] (id=G110, run=2, gpu=1.011 ms, cpu=0.242 ms) | |
| aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=1.011 ms, cpu=0.242 ms) | |
| aten::add_out_mps::f32[159882]:f32[159882]:f32[159882] (id=G111, run=2, gpu=1.011 ms, cpu=0.242 ms) | |
| aten::gather_kernel_2:MPS(buf#667:3):Float[159882, 1]:MPS(buf#675:1):Float[159882, 1] (id=K4, run=4, gpu=1.011 ms, cpu=0.242 ms) | |
| aten::div_out_mps::f32[159882,1]:f32[Scalar]:f32[159882,1] (id=G112, run=4, gpu=1.011 ms, cpu=0.242 ms) | |
| aten::gather_kernel_2:MPS(buf#667:3):Float[159882, 1]:MPS(buf#675:1):Float[159882, 1] (id=K4, run=4, gpu=1.011 ms, cpu=0.242 ms) | |
| aten::div_out_mps::f32[159882,1]:f32[Scalar]:f32[159882,1] (id=G112, run=4, gpu=1.011 ms, cpu=0.242 ms) | |
| aten::gather_kernel_2:MPS(buf#667:3):Float[159882, 1]:MPS(buf#675:1):Float[159882, 1] (id=K4, run=4, gpu=1.011 ms, cpu=0.242 ms) | |
| aten::div_out_mps::f32[159882,1]:f32[Scalar]:f32[159882,1] (id=G112, run=4, gpu=1.011 ms, cpu=0.242 ms) | |
| aten::gather_kernel_2:MPS(buf#667:3):Float[159882, 1]:MPS(buf#675:1):Float[159882, 1] (id=K4, run=4, gpu=1.011 ms, cpu=0.242 ms) | |
| aten::div_out_mps::f32[159882,1]:f32[Scalar]:f32[159882,1] (id=G112, run=4, gpu=1.011 ms, cpu=0.242 ms) | |
| aten::clamp_out_mps_max:4.135167_scalar::f32[159882,1] (id=G113, run=2, gpu=1.011 ms, cpu=0.242 ms) | |
| aten::clamp_out_mps_max:4.135167_scalar::f32[159882,1] (id=G113, run=2, gpu=9.022 ms, cpu=0.032 ms) | |
| aten::mul:f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G114, run=4, gpu=9.022 ms, cpu=0.032 ms) | |
| aten::add_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G115, run=2, gpu=9.022 ms, cpu=0.032 ms) | |
| aten::mul:f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G114, run=4, gpu=9.022 ms, cpu=0.032 ms) | |
| aten::add_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G115, run=2, gpu=9.022 ms, cpu=0.032 ms) | |
| aten::exp:MPS(buf#673:2):Float[159882, 1] (id=K5, run=2, gpu=9.022 ms, cpu=0.032 ms) | |
| aten::mul:f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G114, run=4, gpu=9.022 ms, cpu=0.032 ms) | |
| aten::exp:MPS(buf#673:2):Float[159882, 1] (id=K5, run=2, gpu=9.022 ms, cpu=0.032 ms) | |
| aten::mul:f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G114, run=4, gpu=9.022 ms, cpu=0.032 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#644:1):Float[] (len=4 bytes, gpu=9.022 ms, cpu=0.032 ms) | |
| aten::mul:f32[Scalar]:f32[159882,1]:f32[159882,1] (id=G116, run=1, gpu=9.417 ms, cpu=0.030 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#644:2):Float[] (len=4 bytes, gpu=9.417 ms, cpu=0.030 ms) | |
| aten::mul:f32[Scalar]:f32[159882,1]:f32[159882,1] (id=G116, run=2, gpu=7.972 ms, cpu=0.335 ms) | |
| aten::sub_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G117, run=2, gpu=7.972 ms, cpu=0.335 ms) | |
| aten::sub_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G117, run=2, gpu=7.972 ms, cpu=0.335 ms) | |
| aten::add_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G115, run=4, gpu=7.972 ms, cpu=0.335 ms) | |
| aten::add_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G115, run=4, gpu=7.972 ms, cpu=0.335 ms) | |
| aten::cat_out_mps:2:NCHW:f32:4 (id=G118, run=1, gpu=7.972 ms, cpu=0.335 ms) | |
| aten::fill_scalar_mps_impl:i64[30000]:1.000000 (id=G119, run=1, gpu=24.877 ms, cpu=0.240 ms) | |
| aten::fill_scalar_mps_impl:i64[7500]:2.000000 (id=G120, run=1, gpu=24.877 ms, cpu=0.240 ms) | |
| aten::fill_scalar_mps_impl:i64[1875]:3.000000 (id=G121, run=1, gpu=24.877 ms, cpu=0.240 ms) | |
| aten::fill_scalar_mps_impl:i64[507]:4.000000 (id=G122, run=1, gpu=24.877 ms, cpu=0.240 ms) | |
| aten::cat_out_mps:0:NCHW:i64:5 (id=G123, run=1, gpu=24.877 ms, cpu=0.240 ms) | |
| aten::topk:1,120000:Float32:k1000:dim1:largest1 (id=G124, run=1, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::add_out_mps::i64[1,1000]:i64[Scalar]:i64[1,1000] (id=G125, run=4, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::gather_kernel_2:MPS(buf#655:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::topk:1,30000:Float32:k1000:dim1:largest1 (id=G126, run=1, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::add_out_mps::i64[1,1000]:i64[Scalar]:i64[1,1000] (id=G125, run=4, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::gather_kernel_2:MPS(buf#655:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::topk:1,7500:Float32:k1000:dim1:largest1 (id=G127, run=1, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::add_out_mps::i64[1,1000]:i64[Scalar]:i64[1,1000] (id=G125, run=4, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::gather_kernel_2:MPS(buf#655:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::topk:1,1875:Float32:k1000:dim1:largest1 (id=G128, run=1, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::add_out_mps::i64[1,1000]:i64[Scalar]:i64[1,1000] (id=G125, run=4, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::gather_kernel_2:MPS(buf#655:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::topk:1,507:Float32:k507:dim1:largest1 (id=G129, run=1, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::add_out_mps::i64[1,507]:i64[Scalar]:i64[1,507] (id=G130, run=1, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::cat_out_mps:1:NCHW:i64:5 (id=G131, run=1, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::arange_mps_out:i64[1]:1 (id=G132, run=1, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::gather_kernel_2:MPS(buf#644:2):Long[1, 4507]:MPS(buf#615:2):Long[1, 4507] (id=K6, run=2, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#685:1):Float[1, 4507, 4] (id=K7, run=2, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::gather_kernel_2:MPS(buf#644:2):Long[1, 4507]:MPS(buf#615:2):Long[1, 4507] (id=K6, run=2, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#657:1):Long[1, 4507] (id=K8, run=1, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::gather_kernel_3:MPS(buf#644:2):Long[1, 4507, 1]:MPS(buf#615:2):Long[1, 4507, 1] (id=K9, run=1, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#685:1):Float[1, 4507, 4] (id=K7, run=2, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::sigmoid_out_mps:f32[1,4507]:f32[1,4507] (id=G133, run=1, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::gather_kernel_2:MPS(buf#655:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::clamp_out_mps_min:0.000000_max:800.000000_scalar::f32[4507,2] (id=G134, run=2, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::gather_kernel_2:MPS(buf#655:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::clamp_out_mps_min:0.000000_max:800.000000_scalar::f32[4507,2] (id=G134, run=2, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::cat_out_mps:2:NCHW:f32:2 (id=G135, run=1, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::gather_kernel_1:MPS(buf#659:2):Float[4507]:MPS(buf#616:1):Float[4507] (id=K3, run=10, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::gather_kernel_1:MPS(buf#659:2):Float[4507]:MPS(buf#616:1):Float[4507] (id=K3, run=10, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::sub_out_mps::f32[4507]:f32[4507]:f32[4507] (id=G136, run=2, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::gather_kernel_1:MPS(buf#659:2):Float[4507]:MPS(buf#616:1):Float[4507] (id=K3, run=10, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::gather_kernel_1:MPS(buf#659:2):Float[4507]:MPS(buf#616:1):Float[4507] (id=K3, run=10, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::sub_out_mps::f32[4507]:f32[4507]:f32[4507] (id=G136, run=2, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::greaterThanOrEqualTo:f32[4507]:f32[Scalar]:b8[4507] (id=G137, run=2, gpu=72.076 ms, cpu=0.028 ms) | |
| aten::greaterThanOrEqualTo:f32[4507]:f32[Scalar]:b8[4507] (id=G137, run=2, gpu=0.003 ms, cpu=0.013 ms) | |
| aten::bitwise_and_tensor:MPS(buf#622:2):Bool[4507]:MPS(buf#627:2):Bool[4507] (id=K10, run=1, gpu=0.003 ms, cpu=0.013 ms) | |
| aten::count_nonzero_mps:0::b8[4507]:0:7::i64[Scalar]:Bool (id=G138, run=1, gpu=4.918 ms, cpu=0.043 ms) | |
| BlitCopySync: MPS(buf#642:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=4.918 ms, cpu=0.043 ms) | |
| aten::nonzero_out_native_mps:b8[4507] (id=G139, run=1, gpu=0.439 ms, cpu=0.084 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#623:1):Float[4507] (id=K7, run=4, gpu=0.439 ms, cpu=0.084 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#623:1):Float[4507] (id=K7, run=4, gpu=0.439 ms, cpu=0.084 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#619:1):Long[4507] (id=K8, run=2, gpu=0.439 ms, cpu=0.084 ms) | |
| aten::greaterThanOrEqualTo:f32[4507]:f32[Scalar]:b8[4507] (id=G137, run=3, gpu=0.439 ms, cpu=0.084 ms) | |
| aten::count_nonzero_mps:0::b8[4507]:0:7::i64[Scalar]:Bool (id=G138, run=2, gpu=0.629 ms, cpu=0.034 ms) | |
| BlitCopySync: MPS(buf#642:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.629 ms, cpu=0.034 ms) | |
| aten::nonzero_out_native_mps:b8[4507] (id=G139, run=2, gpu=1.279 ms, cpu=0.041 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#654:1):Float[4507] (id=K7, run=6, gpu=1.279 ms, cpu=0.041 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#654:1):Float[4507] (id=K7, run=6, gpu=1.279 ms, cpu=0.041 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#616:1):Long[4507] (id=K8, run=3, gpu=1.279 ms, cpu=0.041 ms) | |
| aten::max_mps:f32[4507,4] (id=G140, run=1, gpu=1.279 ms, cpu=0.041 ms) | |
| aten::copy_cast_mps:i64[[-1]]:f32[[-1]]:0 (id=G141, run=1, gpu=1.279 ms, cpu=0.041 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#628:1):Float[] (len=4 bytes, gpu=1.279 ms, cpu=0.041 ms) | |
| aten::add_out_mps::f32[Scalar]:f32[Scalar]:f32[Scalar] (id=G142, run=1, gpu=18.014 ms, cpu=0.163 ms) | |
| aten::mul:f32[4507]:f32[Scalar]:f32[4507] (id=G143, run=1, gpu=18.014 ms, cpu=0.163 ms) | |
| aten::add_out_mps::f32[4507,4]:f32[4507,1]:f32[4507,4] (id=G144, run=1, gpu=18.014 ms, cpu=0.163 ms) | |
| BlitCopy: MPS(buf#663:1):Float[4507] --> MPS(buf#616:1):Float[4507] (len=17.61 KB, gpu=18.014 ms, cpu=0.163 ms) | |
| aten::sort:4507:Float32:dim0:descending1 (id=G145, run=1, gpu=18.014 ms, cpu=0.163 ms) | |
| aten::index_select_out_mps:f32[4507,4]:i64[4507]:0 (id=G146, run=1, gpu=18.014 ms, cpu=0.163 ms) | |
| aten::nms_float:MPS(buf#615:2):Float[4507, 4]:MPS(buf#663:2):Float[4507] (id=K11, run=1, gpu=18.014 ms, cpu=0.163 ms) | |
| BlitCopySync: MPS(buf#584:2):Long[319997] --> CPU:Long[319997] (len=2.44 MB, gpu=18.014 ms, cpu=0.163 ms) | |
| BlitCopySync: CPU:Long[2004] --> MPS(buf#668:1):Long[2004] (len=15.66 KB, gpu=2.291 ms, cpu=0.040 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#670:1):Long[2004] (id=K8, run=4, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#663:1):Float[1000] (id=K7, run=8, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#663:1):Float[1000] (id=K7, run=8, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=4, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=4, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::cat_out_mps:1:NCHW:f32:2 (id=G147, run=1, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::gather_kernel_1:MPS(buf#615:2):Float[1000]:MPS(buf#661:1):Float[1000] (id=K3, run=14, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::gather_kernel_1:MPS(buf#615:2):Float[1000]:MPS(buf#661:1):Float[1000] (id=K3, run=14, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::sub_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G148, run=2, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::gather_kernel_1:MPS(buf#615:2):Float[1000]:MPS(buf#661:1):Float[1000] (id=K3, run=14, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::gather_kernel_1:MPS(buf#615:2):Float[1000]:MPS(buf#661:1):Float[1000] (id=K3, run=14, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::sub_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G148, run=2, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::mul:f32[1000]:f32[1000]:f32[1000] (id=G149, run=1, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=4, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::sqrt_out_mps:f32[1000]:f32[1000] (id=G150, run=1, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::div_out_mps::f32[1000]:i64[Scalar]:f32[1000] (id=G151, run=1, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::log2_out_mps:f32[1000]:f32[1000] (id=G152, run=1, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::add_out_mps::f32[1000]:i64[Scalar]:f32[1000] (id=G153, run=1, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::add_out_mps::f32[1000]:f32[Scalar]:f32[1000] (id=G154, run=1, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::floor_out_mps:f32[1000]:f32[1000] (id=G155, run=1, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::clamp_out_mps_min:2.000000_max:5.000000_scalar::f32[1000] (id=G156, run=1, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::copy_cast_mps:f32[[-1]]:i64[[-1]]:0 (id=G157, run=1, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::sub_out_mps::i64[1000]:i64[Scalar]:i64[1000] (id=G158, run=1, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::equal:i64[1000]:i64[Scalar]:b8[1000] (id=G159, run=1, gpu=51.431 ms, cpu=42.563 ms) | |
| aten::count_nonzero_mps:0::b8[1000]:0:7::i64[Scalar]:Bool (id=G160, run=1, gpu=0.312 ms, cpu=0.044 ms) | |
| BlitCopySync: MPS(buf#644:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.312 ms, cpu=0.044 ms) | |
| Error: command buffer exited with error status. | |
| The Metal Performance Shaders operations encoded on it may not have completed. | |
| Error: | |
| (null) | |
| Internal Error (0000000e:Internal Error) | |
| <AGXG15XFamilyCommandBuffer: 0x3b3ce3dc0> | |
| label = <none> | |
| device = <AGXG15CDevice: 0x133ed4000> | |
| name = Apple M3 Max | |
| commandQueue = <AGXG15XFamilyCommandQueue: 0x105063600> | |
| label = <none> | |
| device = <AGXG15CDevice: 0x133ed4000> | |
| name = Apple M3 Max | |
| retainedReferences = 1 | |
| aten::nonzero_out_native_mps:b8[1000] (id=G161, run=1, gpu=26795.452 ms, cpu=0.052 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#623:1):Float[959, 5] (id=K7, run=9, gpu=26795.452 ms, cpu=0.052 ms) | |
| aten::roi_align_float:MPS(buf#610:1):Float[1, 256, 200, 200]:MPS(buf#660:2):Float[959, 5] (id=K12, run=1, gpu=26795.452 ms, cpu=0.052 ms) | |
| aten::index_put_32bit_idx32:MPS(buf#688:2):Float[959, 256, 7, 7] (id=K13, run=1, gpu=26795.452 ms, cpu=0.052 ms) | |
| aten::equal:i64[1000]:i64[Scalar]:b8[1000] (id=G159, run=2, gpu=26795.452 ms, cpu=0.052 ms) | |
| aten::count_nonzero_mps:0::b8[1000]:0:7::i64[Scalar]:Bool (id=G160, run=2, gpu=5.924 ms, cpu=0.217 ms) | |
| BlitCopySync: MPS(buf#644:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=5.924 ms, cpu=0.217 ms) | |
| aten::nonzero_out_native_mps:b8[1000] (id=G161, run=2, gpu=34729.081 ms, cpu=0.040 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#623:1):Float[959, 5] (id=K7, run=10, gpu=34729.081 ms, cpu=0.040 ms) | |
| aten::roi_align_float:MPS(buf#600:1):Float[1, 256, 100, 100]:MPS(buf#619:2):Float[959, 5] (id=K12, run=2, gpu=34729.081 ms, cpu=0.040 ms) | |
| aten::index_put_32bit_idx32:MPS(buf#690:2):Float[959, 256, 7, 7] (id=K13, run=2, gpu=34729.081 ms, cpu=0.040 ms) | |
| aten::equal:i64[1000]:i64[Scalar]:b8[1000] (id=G159, run=3, gpu=34729.081 ms, cpu=0.040 ms) | |
| aten::count_nonzero_mps:0::b8[1000]:0:7::i64[Scalar]:Bool (id=G160, run=3, gpu=0.038 ms, cpu=0.264 ms) | |
| BlitCopySync: MPS(buf#644:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.038 ms, cpu=0.264 ms) | |
| aten::nonzero_out_native_mps:b8[1000] (id=G161, run=3, gpu=1.195 ms, cpu=0.265 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#623:1):Float[3, 5] (id=K7, run=11, gpu=1.195 ms, cpu=0.265 ms) | |
| aten::roi_align_float:MPS(buf#590:1):Float[1, 256, 50, 50]:MPS(buf#652:2):Float[3, 5] (id=K12, run=3, gpu=1.195 ms, cpu=0.265 ms) | |
| aten::index_put_32bit_idx32:MPS(buf#655:2):Float[3, 256, 7, 7] (id=K13, run=3, gpu=1.195 ms, cpu=0.265 ms) | |
| aten::equal:i64[1000]:i64[Scalar]:b8[1000] (id=G159, run=4, gpu=1.195 ms, cpu=0.265 ms) | |
| aten::count_nonzero_mps:0::b8[1000]:0:7::i64[Scalar]:Bool (id=G160, run=4, gpu=0.031 ms, cpu=0.032 ms) | |
| BlitCopySync: MPS(buf#644:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.031 ms, cpu=0.032 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1000,256,7,7]:f32[256,256,3,3]:0:nobias (id=G162, run=4, gpu=5.316 ms, cpu=0.796 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1000,256,7,7::f32[1000,256,7,7]:f32[256]:f32[256]:f32[256]:f32[256] (id=G163, run=4, gpu=5.316 ms, cpu=0.796 ms) | |
| aten::relu_:f32[1000,256,7,7] (id=G164, run=4, gpu=5.456 ms, cpu=0.025 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1000,256,7,7]:f32[256,256,3,3]:0:nobias (id=G162, run=4, gpu=5.456 ms, cpu=0.025 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1000,256,7,7::f32[1000,256,7,7]:f32[256]:f32[256]:f32[256]:f32[256] (id=G163, run=4, gpu=5.456 ms, cpu=0.025 ms) | |
| aten::relu_:f32[1000,256,7,7] (id=G164, run=4, gpu=5.321 ms, cpu=0.678 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1000,256,7,7]:f32[256,256,3,3]:0:nobias (id=G162, run=4, gpu=5.321 ms, cpu=0.678 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1000,256,7,7::f32[1000,256,7,7]:f32[256]:f32[256]:f32[256]:f32[256] (id=G163, run=4, gpu=5.321 ms, cpu=0.678 ms) | |
| aten::relu_:f32[1000,256,7,7] (id=G164, run=4, gpu=5.320 ms, cpu=0.020 ms) | |
| aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1000,256,7,7]:f32[256,256,3,3]:0:nobias (id=G162, run=4, gpu=5.320 ms, cpu=0.020 ms) | |
| aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1000,256,7,7::f32[1000,256,7,7]:f32[256]:f32[256]:f32[256]:f32[256] (id=G163, run=4, gpu=5.320 ms, cpu=0.020 ms) | |
| aten::relu_:f32[1000,256,7,7] (id=G164, run=4, gpu=2.929 ms, cpu=0.830 ms) | |
| aten::mps_linear:f32[1000,12544]:f32[1024,12544]:f32[1024] (id=G165, run=1, gpu=2.929 ms, cpu=0.830 ms) | |
| aten::relu_:f32[1000,1024] (id=G166, run=1, gpu=2.929 ms, cpu=0.830 ms) | |
| aten::mps_linear:f32[1000,1024]:f32[91,1024]:f32[91] (id=G167, run=1, gpu=2.929 ms, cpu=0.830 ms) | |
| aten::mps_linear:f32[1000,1024]:f32[364,1024]:f32[364] (id=G168, run=1, gpu=2.929 ms, cpu=0.830 ms) | |
| aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=5, gpu=2.929 ms, cpu=0.830 ms) | |
| aten::gather_kernel_1:MPS(buf#655:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=2.929 ms, cpu=0.830 ms) | |
| aten::gather_kernel_1:MPS(buf#655:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=2.929 ms, cpu=0.830 ms) | |
| aten::sub_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G148, run=4, gpu=2.929 ms, cpu=0.830 ms) | |
| aten::gather_kernel_1:MPS(buf#655:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=2.929 ms, cpu=0.830 ms) | |
| aten::gather_kernel_1:MPS(buf#655:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=2.929 ms, cpu=0.830 ms) | |
| aten::sub_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G148, run=4, gpu=2.929 ms, cpu=0.830 ms) | |
| aten::mul:f32[1000]:f32[Scalar]:f32[1000] (id=G169, run=2, gpu=0.038 ms, cpu=0.037 ms) | |
| aten::gather_kernel_1:MPS(buf#655:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=0.038 ms, cpu=0.037 ms) | |
| aten::add_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G170, run=2, gpu=0.038 ms, cpu=0.037 ms) | |
| aten::mul:f32[1000]:f32[Scalar]:f32[1000] (id=G169, run=2, gpu=0.038 ms, cpu=0.037 ms) | |
| aten::gather_kernel_1:MPS(buf#655:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=0.038 ms, cpu=0.037 ms) | |
| aten::add_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G170, run=2, gpu=0.038 ms, cpu=0.037 ms) | |
| aten::gather_kernel_2:MPS(buf#657:3):Float[1000, 91]:MPS(buf#669:1):Float[1000, 91] (id=K4, run=14, gpu=0.038 ms, cpu=0.037 ms) | |
| aten::div_out_mps::f32[1000,91]:f32[Scalar]:f32[1000,91] (id=G171, run=4, gpu=0.038 ms, cpu=0.037 ms) | |
| aten::gather_kernel_2:MPS(buf#657:3):Float[1000, 91]:MPS(buf#669:1):Float[1000, 91] (id=K4, run=14, gpu=0.038 ms, cpu=0.037 ms) | |
| aten::div_out_mps::f32[1000,91]:f32[Scalar]:f32[1000,91] (id=G171, run=4, gpu=0.038 ms, cpu=0.037 ms) | |
| aten::gather_kernel_2:MPS(buf#657:3):Float[1000, 91]:MPS(buf#669:1):Float[1000, 91] (id=K4, run=14, gpu=0.038 ms, cpu=0.037 ms) | |
| aten::div_out_mps::f32[1000,91]:f32[Scalar]:f32[1000,91] (id=G171, run=4, gpu=0.038 ms, cpu=0.037 ms) | |
| aten::gather_kernel_2:MPS(buf#657:3):Float[1000, 91]:MPS(buf#669:1):Float[1000, 91] (id=K4, run=14, gpu=0.038 ms, cpu=0.037 ms) | |
| aten::div_out_mps::f32[1000,91]:f32[Scalar]:f32[1000,91] (id=G171, run=4, gpu=0.021 ms, cpu=0.036 ms) | |
| aten::clamp_out_mps_max:4.135167_scalar::f32[1000,91] (id=G172, run=2, gpu=0.021 ms, cpu=0.036 ms) | |
| aten::clamp_out_mps_max:4.135167_scalar::f32[1000,91] (id=G172, run=2, gpu=0.021 ms, cpu=0.036 ms) | |
| aten::mul:f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G173, run=4, gpu=0.021 ms, cpu=0.036 ms) | |
| aten::add_out_mps::f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G174, run=2, gpu=0.021 ms, cpu=0.036 ms) | |
| aten::mul:f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G173, run=4, gpu=0.021 ms, cpu=0.036 ms) | |
| aten::add_out_mps::f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G174, run=2, gpu=0.021 ms, cpu=0.036 ms) | |
| aten::exp:MPS(buf#666:2):Float[1000, 91] (id=K5, run=4, gpu=0.021 ms, cpu=0.036 ms) | |
| aten::mul:f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G173, run=4, gpu=0.030 ms, cpu=0.036 ms) | |
| aten::exp:MPS(buf#666:2):Float[1000, 91] (id=K5, run=4, gpu=0.030 ms, cpu=0.036 ms) | |
| aten::mul:f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G173, run=4, gpu=0.030 ms, cpu=0.036 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#644:1):Float[] (len=4 bytes, gpu=0.030 ms, cpu=0.036 ms) | |
| aten::mul:f32[Scalar]:f32[1000,91]:f32[1000,91] (id=G175, run=1, gpu=0.013 ms, cpu=0.097 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#642:1):Float[] (len=4 bytes, gpu=0.013 ms, cpu=0.097 ms) | |
| aten::mul:f32[Scalar]:f32[1000,91]:f32[1000,91] (id=G175, run=2, gpu=0.044 ms, cpu=0.026 ms) | |
| aten::sub_out_mps::f32[1000,91]:f32[1000,91]:f32[1000,91] (id=G176, run=2, gpu=0.044 ms, cpu=0.026 ms) | |
| aten::sub_out_mps::f32[1000,91]:f32[1000,91]:f32[1000,91] (id=G176, run=2, gpu=0.044 ms, cpu=0.026 ms) | |
| aten::add_out_mps::f32[1000,91]:f32[1000,91]:f32[1000,91] (id=G177, run=2, gpu=0.044 ms, cpu=0.026 ms) | |
| aten::add_out_mps::f32[1000,91]:f32[1000,91]:f32[1000,91] (id=G177, run=2, gpu=0.044 ms, cpu=0.026 ms) | |
| aten::cat_out_mps:2:NCHW:f32:4 (id=G118, run=2, gpu=0.044 ms, cpu=0.026 ms) | |
| aten::softmax_mps_out:f32[[-1]]:Contiguous:1 (id=G178, run=1, gpu=0.102 ms, cpu=0.305 ms) | |
| aten::gather_kernel_3:MPS(buf#667:2):Float[1000, 90, 4]:MPS(buf#685:1):Float[1000, 90, 4] (id=K14, run=3, gpu=0.102 ms, cpu=0.305 ms) | |
| aten::clamp_out_mps_min:0.000000_max:800.000000_scalar::f32[1000,91,2] (id=G179, run=2, gpu=0.102 ms, cpu=0.305 ms) | |
| aten::gather_kernel_3:MPS(buf#667:2):Float[1000, 90, 4]:MPS(buf#685:1):Float[1000, 90, 4] (id=K14, run=3, gpu=0.102 ms, cpu=0.305 ms) | |
| aten::clamp_out_mps_min:0.000000_max:800.000000_scalar::f32[1000,91,2] (id=G179, run=2, gpu=0.102 ms, cpu=0.305 ms) | |
| aten::cat_out_mps:3:NCHW:f32:2 (id=G180, run=1, gpu=0.102 ms, cpu=0.305 ms) | |
| aten::arange_mps_out:i64[91]:91 (id=G181, run=1, gpu=0.102 ms, cpu=0.305 ms) | |
| aten::gather_kernel_3:MPS(buf#667:2):Float[1000, 90, 4]:MPS(buf#685:1):Float[1000, 90, 4] (id=K14, run=3, gpu=0.102 ms, cpu=0.305 ms) | |
| aten::gather_kernel_2:MPS(buf#616:2):Float[1000, 90]:MPS(buf#663:1):Float[1000, 90] (id=K4, run=15, gpu=0.102 ms, cpu=0.305 ms) | |
| aten::gather_kernel_2:MPS(buf#655:2):Long[1000, 90]:MPS(buf#699:2):Long[1000, 90] (id=K6, run=3, gpu=0.102 ms, cpu=0.305 ms) | |
| aten::greaterThan:f32[90000]:f32[Scalar]:b8[90000] (id=G182, run=1, gpu=0.102 ms, cpu=0.305 ms) | |
| aten::count_nonzero_mps:0::b8[90000]:0:7::i64[Scalar]:Bool (id=G183, run=1, gpu=0.033 ms, cpu=0.038 ms) | |
| BlitCopySync: MPS(buf#628:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.033 ms, cpu=0.038 ms) | |
| aten::nonzero_out_native_mps:b8[90000] (id=G184, run=1, gpu=0.043 ms, cpu=0.047 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#663:1):Float[262] (id=K7, run=13, gpu=0.043 ms, cpu=0.047 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#663:1):Float[262] (id=K7, run=13, gpu=0.043 ms, cpu=0.047 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#699:1):Long[262] (id=K8, run=5, gpu=0.043 ms, cpu=0.047 ms) | |
| aten::gather_kernel_1:MPS(buf#655:3):Float[262]:MPS(buf#675:1):Float[262] (id=K3, run=24, gpu=0.043 ms, cpu=0.047 ms) | |
| aten::gather_kernel_1:MPS(buf#655:3):Float[262]:MPS(buf#675:1):Float[262] (id=K3, run=24, gpu=0.043 ms, cpu=0.047 ms) | |
| aten::sub_out_mps::f32[262]:f32[262]:f32[262] (id=G185, run=2, gpu=0.016 ms, cpu=0.015 ms) | |
| aten::gather_kernel_1:MPS(buf#655:3):Float[262]:MPS(buf#675:1):Float[262] (id=K3, run=24, gpu=0.016 ms, cpu=0.015 ms) | |
| aten::gather_kernel_1:MPS(buf#655:3):Float[262]:MPS(buf#675:1):Float[262] (id=K3, run=24, gpu=0.016 ms, cpu=0.015 ms) | |
| aten::sub_out_mps::f32[262]:f32[262]:f32[262] (id=G185, run=2, gpu=0.016 ms, cpu=0.015 ms) | |
| aten::greaterThanOrEqualTo:f32[262]:f32[Scalar]:b8[262] (id=G186, run=2, gpu=0.016 ms, cpu=0.015 ms) | |
| aten::greaterThanOrEqualTo:f32[262]:f32[Scalar]:b8[262] (id=G186, run=2, gpu=0.016 ms, cpu=0.015 ms) | |
| aten::bitwise_and_tensor:MPS(buf#666:2):Bool[262]:MPS(buf#675:2):Bool[262] (id=K10, run=2, gpu=0.016 ms, cpu=0.015 ms) | |
| aten::count_nonzero_mps:0::b8[262]:0:7::i64[Scalar]:Bool (id=G187, run=1, gpu=0.012 ms, cpu=0.025 ms) | |
| BlitCopySync: MPS(buf#628:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.012 ms, cpu=0.025 ms) | |
| aten::nonzero_out_native_mps:b8[262] (id=G188, run=1, gpu=0.062 ms, cpu=0.040 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#625:1):Float[262] (id=K7, run=15, gpu=0.062 ms, cpu=0.040 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#625:1):Float[262] (id=K7, run=15, gpu=0.062 ms, cpu=0.040 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#674:1):Long[262] (id=K8, run=6, gpu=0.062 ms, cpu=0.040 ms) | |
| aten::max_mps:f32[262,4] (id=G189, run=1, gpu=0.062 ms, cpu=0.040 ms) | |
| aten::copy_cast_mps:i64[[-1]]:f32[[-1]]:0 (id=G141, run=2, gpu=0.062 ms, cpu=0.040 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#636:1):Float[] (len=4 bytes, gpu=0.062 ms, cpu=0.040 ms) | |
| aten::add_out_mps::f32[Scalar]:f32[Scalar]:f32[Scalar] (id=G142, run=2, gpu=0.518 ms, cpu=0.049 ms) | |
| aten::mul:f32[262]:f32[Scalar]:f32[262] (id=G190, run=1, gpu=0.518 ms, cpu=0.049 ms) | |
| aten::add_out_mps::f32[262,4]:f32[262,1]:f32[262,4] (id=G191, run=1, gpu=0.518 ms, cpu=0.049 ms) | |
| BlitCopy: MPS(buf#669:1):Float[262] --> MPS(buf#679:1):Float[262] (len=1.02 KB, gpu=0.518 ms, cpu=0.049 ms) | |
| aten::sort:262:Float32:dim0:descending1 (id=G192, run=1, gpu=0.518 ms, cpu=0.049 ms) | |
| aten::index_select_out_mps:f32[262,4]:i64[262]:0 (id=G193, run=1, gpu=0.518 ms, cpu=0.049 ms) | |
| aten::nms_float:MPS(buf#674:2):Float[262, 4]:MPS(buf#669:2):Float[262] (id=K11, run=2, gpu=0.518 ms, cpu=0.049 ms) | |
| BlitCopySync: MPS(buf#676:2):Long[1310] --> CPU:Long[1310] (len=10.23 KB, gpu=0.518 ms, cpu=0.049 ms) | |
| BlitCopySync: CPU:Long[158] --> MPS(buf#671:1):Long[158] (len=1.23 KB, gpu=0.006 ms, cpu=0.024 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#675:1):Long[100] (id=K8, run=8, gpu=0.032 ms, cpu=0.039 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#669:1):Float[100] (id=K7, run=17, gpu=0.032 ms, cpu=0.039 ms) | |
| aten::index_select_32bit_idx32:MPS(buf#669:1):Float[100] (id=K7, run=17, gpu=0.032 ms, cpu=0.039 ms) | |
| aten::index_select_64bit_idx32:MPS(buf#675:1):Long[100] (id=K8, run=8, gpu=0.032 ms, cpu=0.039 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#628:1):Float[] (len=4 bytes, gpu=0.032 ms, cpu=0.039 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#636:1):Float[] (len=4 bytes, gpu=0.006 ms, cpu=0.027 ms) | |
| aten::div_out_mps::f32[Scalar]:f32[Scalar]:f32[Scalar] (id=G194, run=1, gpu=0.011 ms, cpu=0.035 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#628:2):Float[] (len=4 bytes, gpu=0.011 ms, cpu=0.035 ms) | |
| BlitCopySync: CPU:Float[] --> MPS(buf#636:1):Float[] (len=4 bytes, gpu=0.003 ms, cpu=0.023 ms) | |
| -------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ -------------------------------------------------------------------------------- | |
| Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls Input Shapes | |
| -------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ -------------------------------------------------------------------------------- | |
| model_inference 0.01% 8.528ms 100.00% 62.768s 62.768s 1 [] | |
| aten::where 0.00% 5.208us 98.20% 61.637s 15.409s 4 [[1000]] | |
| aten::nonzero_numpy 0.00% 26.543us 98.20% 61.637s 15.409s 4 [[1000]] | |
| aten::nonzero 98.18% 61.625s 98.20% 61.636s 15.409s 4 [[1000]] | |
| aten::upsample_nearest2d 0.38% 238.733ms 0.38% 238.733ms 238.733ms 1 [[1, 256, 25, 25], [], []] | |
| aten::where 0.00% 7.082us 0.18% 113.257ms 56.628ms 2 [[4507]] | |
| aten::nonzero_numpy 0.00% 23.751us 0.18% 113.249ms 56.625ms 2 [[4507]] | |
| aten::nonzero 0.16% 101.306ms 0.18% 113.183ms 56.591ms 2 [[4507]] | |
| aten::to 0.00% 8.083us 0.11% 69.337ms 13.867ms 5 [[3, 4], [], [], [], [], []] | |
| aten::_to_copy 0.00% 34.708us 0.11% 69.329ms 13.866ms 5 [[3, 4], [], [], [], [], [], []] | |
| -------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ -------------------------------------------------------------------------------- | |
| Self CPU time total: 62.768s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment