Skip to content

Instantly share code, notes, and snippets.

@hvaara
Created August 21, 2024 01:38
Show Gist options
  • Select an option

  • Save hvaara/5e3b4eed12cd51dcfca97a76f558fd76 to your computer and use it in GitHub Desktop.

Select an option

Save hvaara/5e3b4eed12cd51dcfca97a76f558fd76 to your computer and use it in GitHub Desktop.
High watermark memory allocation limit: 163.20 GB
Low watermark memory allocation limit: 134.40 GB
Initializing private heap allocator on unified device memory of size 96.00 GB
BlitCopySync: CPU:Float[3, 224, 224] --> MPS(buf#1:1):Float[3, 224, 224] (len=588.00 KB, gpu=9.644 ms, cpu=4.767 ms)
BlitCopySync: CPU:Float[64, 3, 7, 7] --> MPS(buf#2:1):Float[64, 3, 7, 7] (len=36.75 KB, gpu=1.555 ms, cpu=0.043 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#3:1):Float[64] (len=256 bytes, gpu=1.491 ms, cpu=0.031 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#4:1):Float[64] (len=256 bytes, gpu=0.586 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#5:1):Float[64] (len=256 bytes, gpu=0.564 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#6:1):Float[64] (len=256 bytes, gpu=0.518 ms, cpu=0.024 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#7:1):Long[] (len=8 bytes, gpu=0.409 ms, cpu=0.232 ms)
BlitCopySync: CPU:Float[64, 64, 1, 1] --> MPS(buf#8:1):Float[64, 64, 1, 1] (len=16.00 KB, gpu=0.708 ms, cpu=0.030 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#9:1):Float[64] (len=256 bytes, gpu=0.707 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#10:1):Float[64] (len=256 bytes, gpu=2.733 ms, cpu=0.015 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#11:1):Float[64] (len=256 bytes, gpu=0.589 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#12:1):Float[64] (len=256 bytes, gpu=0.783 ms, cpu=0.029 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#13:1):Long[] (len=8 bytes, gpu=0.694 ms, cpu=0.026 ms)
BlitCopySync: CPU:Float[64, 64, 3, 3] --> MPS(buf#14:1):Float[64, 64, 3, 3] (len=144.00 KB, gpu=0.642 ms, cpu=0.034 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#15:1):Float[64] (len=256 bytes, gpu=0.588 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#16:1):Float[64] (len=256 bytes, gpu=7.791 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#17:1):Float[64] (len=256 bytes, gpu=7.531 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#18:1):Float[64] (len=256 bytes, gpu=9.208 ms, cpu=0.018 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#19:1):Long[] (len=8 bytes, gpu=0.557 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256, 64, 1, 1] --> MPS(buf#20:1):Float[256, 64, 1, 1] (len=64.00 KB, gpu=0.542 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#21:1):Float[256] (len=1024 bytes, gpu=0.579 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#22:1):Float[256] (len=1024 bytes, gpu=0.683 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#23:1):Float[256] (len=1024 bytes, gpu=0.683 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#24:1):Float[256] (len=1024 bytes, gpu=0.701 ms, cpu=0.027 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#25:1):Long[] (len=8 bytes, gpu=7.410 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256, 64, 1, 1] --> MPS(buf#26:1):Float[256, 64, 1, 1] (len=64.00 KB, gpu=9.523 ms, cpu=0.062 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#27:1):Float[256] (len=1024 bytes, gpu=0.515 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#28:1):Float[256] (len=1024 bytes, gpu=0.569 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#29:1):Float[256] (len=1024 bytes, gpu=0.575 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#30:1):Float[256] (len=1024 bytes, gpu=0.710 ms, cpu=0.025 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#31:1):Long[] (len=8 bytes, gpu=0.740 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[64, 256, 1, 1] --> MPS(buf#32:1):Float[64, 256, 1, 1] (len=64.00 KB, gpu=0.661 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#33:1):Float[64] (len=256 bytes, gpu=7.733 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#34:1):Float[64] (len=256 bytes, gpu=9.479 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#35:1):Float[64] (len=256 bytes, gpu=0.483 ms, cpu=0.028 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#36:1):Float[64] (len=256 bytes, gpu=0.762 ms, cpu=0.023 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#37:1):Long[] (len=8 bytes, gpu=0.759 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[64, 64, 3, 3] --> MPS(buf#38:1):Float[64, 64, 3, 3] (len=144.00 KB, gpu=0.510 ms, cpu=0.035 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#39:1):Float[64] (len=256 bytes, gpu=0.758 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#40:1):Float[64] (len=256 bytes, gpu=0.747 ms, cpu=0.023 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#41:1):Float[64] (len=256 bytes, gpu=7.758 ms, cpu=0.031 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#42:1):Float[64] (len=256 bytes, gpu=9.569 ms, cpu=0.030 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#43:1):Long[] (len=8 bytes, gpu=0.325 ms, cpu=0.036 ms)
BlitCopySync: CPU:Float[256, 64, 1, 1] --> MPS(buf#44:1):Float[256, 64, 1, 1] (len=64.00 KB, gpu=0.699 ms, cpu=0.038 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#45:1):Float[256] (len=1024 bytes, gpu=0.691 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#46:1):Float[256] (len=1024 bytes, gpu=0.720 ms, cpu=0.023 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#47:1):Float[256] (len=1024 bytes, gpu=0.675 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#48:1):Float[256] (len=1024 bytes, gpu=0.586 ms, cpu=0.017 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#49:1):Long[] (len=8 bytes, gpu=1.811 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[64, 256, 1, 1] --> MPS(buf#50:1):Float[64, 256, 1, 1] (len=64.00 KB, gpu=0.495 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#51:1):Float[64] (len=256 bytes, gpu=0.657 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#52:1):Float[64] (len=256 bytes, gpu=0.733 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#53:1):Float[64] (len=256 bytes, gpu=0.703 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#54:1):Float[64] (len=256 bytes, gpu=0.731 ms, cpu=0.017 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#55:1):Long[] (len=8 bytes, gpu=0.597 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[64, 64, 3, 3] --> MPS(buf#56:1):Float[64, 64, 3, 3] (len=144.00 KB, gpu=9.403 ms, cpu=0.084 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#57:1):Float[64] (len=256 bytes, gpu=0.335 ms, cpu=0.023 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#58:1):Float[64] (len=256 bytes, gpu=0.525 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#59:1):Float[64] (len=256 bytes, gpu=0.689 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[64] --> MPS(buf#60:1):Float[64] (len=256 bytes, gpu=0.701 ms, cpu=0.015 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#61:1):Long[] (len=8 bytes, gpu=0.745 ms, cpu=0.015 ms)
BlitCopySync: CPU:Float[256, 64, 1, 1] --> MPS(buf#62:1):Float[256, 64, 1, 1] (len=64.00 KB, gpu=0.694 ms, cpu=0.030 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#63:1):Float[256] (len=1024 bytes, gpu=1.721 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#64:1):Float[256] (len=1024 bytes, gpu=0.381 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#65:1):Float[256] (len=1024 bytes, gpu=0.774 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#66:1):Float[256] (len=1024 bytes, gpu=0.657 ms, cpu=0.018 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#67:1):Long[] (len=8 bytes, gpu=0.704 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[128, 256, 1, 1] --> MPS(buf#68:1):Float[128, 256, 1, 1] (len=128.00 KB, gpu=0.696 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#69:1):Float[128] (len=512 bytes, gpu=0.732 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#70:1):Float[128] (len=512 bytes, gpu=2.612 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#71:1):Float[128] (len=512 bytes, gpu=0.543 ms, cpu=0.014 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#72:1):Float[128] (len=512 bytes, gpu=0.737 ms, cpu=0.019 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#73:1):Long[] (len=8 bytes, gpu=0.722 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[128, 128, 3, 3] --> MPS(buf#74:1):Float[128, 128, 3, 3] (len=576.00 KB, gpu=0.664 ms, cpu=0.046 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#75:1):Float[128] (len=512 bytes, gpu=0.609 ms, cpu=0.037 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#76:1):Float[128] (len=512 bytes, gpu=7.534 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#77:1):Float[128] (len=512 bytes, gpu=7.568 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#78:1):Float[128] (len=512 bytes, gpu=2.644 ms, cpu=0.021 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#79:1):Long[] (len=8 bytes, gpu=0.559 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[512, 128, 1, 1] --> MPS(buf#80:1):Float[512, 128, 1, 1] (len=256.00 KB, gpu=0.738 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#81:1):Float[512] (len=2.00 KB, gpu=0.720 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#82:1):Float[512] (len=2.00 KB, gpu=0.706 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#83:1):Float[512] (len=2.00 KB, gpu=0.748 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#84:1):Float[512] (len=2.00 KB, gpu=7.563 ms, cpu=0.019 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#85:1):Long[] (len=8 bytes, gpu=1.499 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[512, 256, 1, 1] --> MPS(buf#86:1):Float[512, 256, 1, 1] (len=512.00 KB, gpu=0.536 ms, cpu=0.050 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#87:1):Float[512] (len=2.00 KB, gpu=0.456 ms, cpu=0.101 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#88:1):Float[512] (len=2.00 KB, gpu=0.618 ms, cpu=0.046 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#89:1):Float[512] (len=2.00 KB, gpu=0.678 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#90:1):Float[512] (len=2.00 KB, gpu=0.654 ms, cpu=0.020 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#91:1):Long[] (len=8 bytes, gpu=0.725 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[128, 512, 1, 1] --> MPS(buf#92:1):Float[128, 512, 1, 1] (len=256.00 KB, gpu=9.616 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#93:1):Float[128] (len=512 bytes, gpu=0.320 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#94:1):Float[128] (len=512 bytes, gpu=0.592 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#95:1):Float[128] (len=512 bytes, gpu=0.765 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#96:1):Float[128] (len=512 bytes, gpu=0.700 ms, cpu=0.025 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#97:1):Long[] (len=8 bytes, gpu=0.755 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[128, 128, 3, 3] --> MPS(buf#98:1):Float[128, 128, 3, 3] (len=576.00 KB, gpu=0.681 ms, cpu=0.045 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#99:1):Float[128] (len=512 bytes, gpu=6.747 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#100:1):Float[128] (len=512 bytes, gpu=0.062 ms, cpu=0.023 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#101:1):Float[128] (len=512 bytes, gpu=9.700 ms, cpu=0.041 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#102:1):Float[128] (len=512 bytes, gpu=0.142 ms, cpu=0.030 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#103:1):Long[] (len=8 bytes, gpu=0.483 ms, cpu=0.032 ms)
BlitCopySync: CPU:Float[512, 128, 1, 1] --> MPS(buf#104:1):Float[512, 128, 1, 1] (len=256.00 KB, gpu=0.464 ms, cpu=0.033 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#105:1):Float[512] (len=2.00 KB, gpu=0.622 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#106:1):Float[512] (len=2.00 KB, gpu=0.586 ms, cpu=0.016 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#107:1):Float[512] (len=2.00 KB, gpu=0.560 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#108:1):Float[512] (len=2.00 KB, gpu=7.473 ms, cpu=0.023 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#109:1):Long[] (len=8 bytes, gpu=4.507 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[128, 512, 1, 1] --> MPS(buf#110:1):Float[128, 512, 1, 1] (len=256.00 KB, gpu=0.550 ms, cpu=0.029 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#111:1):Float[128] (len=512 bytes, gpu=0.766 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#112:1):Float[128] (len=512 bytes, gpu=0.667 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#113:1):Float[128] (len=512 bytes, gpu=1.545 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#114:1):Float[128] (len=512 bytes, gpu=0.583 ms, cpu=0.026 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#115:1):Long[] (len=8 bytes, gpu=0.711 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[128, 128, 3, 3] --> MPS(buf#116:1):Float[128, 128, 3, 3] (len=576.00 KB, gpu=0.658 ms, cpu=0.049 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#117:1):Float[128] (len=512 bytes, gpu=0.684 ms, cpu=0.035 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#118:1):Float[128] (len=512 bytes, gpu=0.748 ms, cpu=0.026 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#119:1):Float[128] (len=512 bytes, gpu=0.742 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#120:1):Float[128] (len=512 bytes, gpu=7.722 ms, cpu=0.027 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#121:1):Long[] (len=8 bytes, gpu=9.662 ms, cpu=0.055 ms)
BlitCopySync: CPU:Float[512, 128, 1, 1] --> MPS(buf#122:1):Float[512, 128, 1, 1] (len=256.00 KB, gpu=0.424 ms, cpu=0.043 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#123:1):Float[512] (len=2.00 KB, gpu=0.586 ms, cpu=0.028 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#124:1):Float[512] (len=2.00 KB, gpu=0.567 ms, cpu=0.028 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#125:1):Float[512] (len=2.00 KB, gpu=0.743 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#126:1):Float[512] (len=2.00 KB, gpu=0.742 ms, cpu=0.019 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#127:1):Long[] (len=8 bytes, gpu=0.765 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[128, 512, 1, 1] --> MPS(buf#128:1):Float[128, 512, 1, 1] (len=256.00 KB, gpu=7.393 ms, cpu=0.028 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#129:1):Float[128] (len=512 bytes, gpu=5.499 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#130:1):Float[128] (len=512 bytes, gpu=0.450 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#131:1):Float[128] (len=512 bytes, gpu=0.715 ms, cpu=0.033 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#132:1):Float[128] (len=512 bytes, gpu=7.564 ms, cpu=0.023 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#133:1):Long[] (len=8 bytes, gpu=7.502 ms, cpu=0.032 ms)
BlitCopySync: CPU:Float[128, 128, 3, 3] --> MPS(buf#134:1):Float[128, 128, 3, 3] (len=576.00 KB, gpu=9.713 ms, cpu=0.045 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#135:1):Float[128] (len=512 bytes, gpu=0.501 ms, cpu=0.035 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#136:1):Float[128] (len=512 bytes, gpu=0.689 ms, cpu=0.026 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#137:1):Float[128] (len=512 bytes, gpu=0.633 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[128] --> MPS(buf#138:1):Float[128] (len=512 bytes, gpu=0.738 ms, cpu=0.020 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#139:1):Long[] (len=8 bytes, gpu=0.768 ms, cpu=0.033 ms)
BlitCopySync: CPU:Float[512, 128, 1, 1] --> MPS(buf#140:1):Float[512, 128, 1, 1] (len=256.00 KB, gpu=0.710 ms, cpu=0.039 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#141:1):Float[512] (len=2.00 KB, gpu=6.751 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#142:1):Float[512] (len=2.00 KB, gpu=0.502 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#143:1):Float[512] (len=2.00 KB, gpu=2.772 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#144:1):Float[512] (len=2.00 KB, gpu=0.556 ms, cpu=0.044 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#145:1):Long[] (len=8 bytes, gpu=0.456 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256, 512, 1, 1] --> MPS(buf#146:1):Float[256, 512, 1, 1] (len=512.00 KB, gpu=0.697 ms, cpu=0.038 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#147:1):Float[256] (len=1024 bytes, gpu=0.760 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#148:1):Float[256] (len=1024 bytes, gpu=0.635 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#149:1):Float[256] (len=1024 bytes, gpu=7.723 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#150:1):Float[256] (len=1024 bytes, gpu=7.444 ms, cpu=0.020 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#151:1):Long[] (len=8 bytes, gpu=9.495 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#152:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.544 ms, cpu=0.864 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#153:1):Float[256] (len=1024 bytes, gpu=0.701 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#154:1):Float[256] (len=1024 bytes, gpu=0.324 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#155:1):Float[256] (len=1024 bytes, gpu=0.684 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#156:1):Float[256] (len=1024 bytes, gpu=0.713 ms, cpu=0.017 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#157:1):Long[] (len=8 bytes, gpu=7.661 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#158:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=2.548 ms, cpu=0.044 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#159:1):Float[1024] (len=4.00 KB, gpu=0.540 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#160:1):Float[1024] (len=4.00 KB, gpu=0.678 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#161:1):Float[1024] (len=4.00 KB, gpu=0.573 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#162:1):Float[1024] (len=4.00 KB, gpu=0.661 ms, cpu=0.028 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#163:1):Long[] (len=8 bytes, gpu=0.691 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[1024, 512, 1, 1] --> MPS(buf#164:1):Float[1024, 512, 1, 1] (len=2.00 MB, gpu=6.689 ms, cpu=0.130 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#165:1):Float[1024] (len=4.00 KB, gpu=0.535 ms, cpu=0.030 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#166:1):Float[1024] (len=4.00 KB, gpu=5.766 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#167:1):Float[1024] (len=4.00 KB, gpu=0.481 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#168:1):Float[1024] (len=4.00 KB, gpu=0.764 ms, cpu=0.017 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#169:1):Long[] (len=8 bytes, gpu=9.743 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#170:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=0.207 ms, cpu=0.295 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#171:1):Float[256] (len=1024 bytes, gpu=0.537 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#172:1):Float[256] (len=1024 bytes, gpu=0.776 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#173:1):Float[256] (len=1024 bytes, gpu=0.717 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#174:1):Float[256] (len=1024 bytes, gpu=0.395 ms, cpu=0.027 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#175:1):Long[] (len=8 bytes, gpu=0.535 ms, cpu=0.066 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#176:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=7.597 ms, cpu=0.097 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#177:1):Float[256] (len=1024 bytes, gpu=3.321 ms, cpu=0.072 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#178:1):Float[256] (len=1024 bytes, gpu=0.619 ms, cpu=0.029 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#179:1):Float[256] (len=1024 bytes, gpu=0.271 ms, cpu=0.029 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#180:1):Float[256] (len=1024 bytes, gpu=0.704 ms, cpu=0.025 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#181:1):Long[] (len=8 bytes, gpu=0.642 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#182:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=7.340 ms, cpu=0.054 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#183:1):Float[1024] (len=4.00 KB, gpu=5.340 ms, cpu=0.061 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#184:1):Float[1024] (len=4.00 KB, gpu=0.406 ms, cpu=0.041 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#185:1):Float[1024] (len=4.00 KB, gpu=0.727 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#186:1):Float[1024] (len=4.00 KB, gpu=9.531 ms, cpu=0.021 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#187:1):Long[] (len=8 bytes, gpu=0.735 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#188:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=0.593 ms, cpu=0.068 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#189:1):Float[256] (len=1024 bytes, gpu=0.364 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#190:1):Float[256] (len=1024 bytes, gpu=0.701 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#191:1):Float[256] (len=1024 bytes, gpu=0.734 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#192:1):Float[256] (len=1024 bytes, gpu=0.750 ms, cpu=0.016 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#193:1):Long[] (len=8 bytes, gpu=7.568 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#194:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=9.503 ms, cpu=0.085 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#195:1):Float[256] (len=1024 bytes, gpu=0.300 ms, cpu=0.037 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#196:1):Float[256] (len=1024 bytes, gpu=0.749 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#197:1):Float[256] (len=1024 bytes, gpu=0.721 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#198:1):Float[256] (len=1024 bytes, gpu=0.705 ms, cpu=0.020 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#199:1):Long[] (len=8 bytes, gpu=0.731 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#200:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=0.710 ms, cpu=0.054 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#201:1):Float[1024] (len=4.00 KB, gpu=6.519 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#202:1):Float[1024] (len=4.00 KB, gpu=0.375 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#203:1):Float[1024] (len=4.00 KB, gpu=9.774 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#204:1):Float[1024] (len=4.00 KB, gpu=0.420 ms, cpu=0.052 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#205:1):Long[] (len=8 bytes, gpu=0.363 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#206:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=0.563 ms, cpu=0.043 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#207:1):Float[256] (len=1024 bytes, gpu=0.660 ms, cpu=0.030 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#208:1):Float[256] (len=1024 bytes, gpu=0.738 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#209:1):Float[256] (len=1024 bytes, gpu=0.684 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#210:1):Float[256] (len=1024 bytes, gpu=7.733 ms, cpu=0.018 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#211:1):Long[] (len=8 bytes, gpu=9.378 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#212:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.389 ms, cpu=0.092 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#213:1):Float[256] (len=1024 bytes, gpu=0.430 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#214:1):Float[256] (len=1024 bytes, gpu=0.364 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#215:1):Float[256] (len=1024 bytes, gpu=0.795 ms, cpu=0.029 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#216:1):Float[256] (len=1024 bytes, gpu=0.727 ms, cpu=0.020 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#217:1):Long[] (len=8 bytes, gpu=0.546 ms, cpu=0.016 ms)
BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#218:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=7.690 ms, cpu=0.046 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#219:1):Float[1024] (len=4.00 KB, gpu=5.386 ms, cpu=0.028 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#220:1):Float[1024] (len=4.00 KB, gpu=0.582 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#221:1):Float[1024] (len=4.00 KB, gpu=0.674 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#222:1):Float[1024] (len=4.00 KB, gpu=7.686 ms, cpu=0.024 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#223:1):Long[] (len=8 bytes, gpu=7.559 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#224:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=9.520 ms, cpu=0.047 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#225:1):Float[256] (len=1024 bytes, gpu=0.362 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#226:1):Float[256] (len=1024 bytes, gpu=0.779 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#227:1):Float[256] (len=1024 bytes, gpu=0.746 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#228:1):Float[256] (len=1024 bytes, gpu=0.723 ms, cpu=0.017 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#229:1):Long[] (len=8 bytes, gpu=0.681 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#230:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.419 ms, cpu=0.082 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#231:1):Float[256] (len=1024 bytes, gpu=6.672 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#232:1):Float[256] (len=1024 bytes, gpu=0.484 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#233:1):Float[256] (len=1024 bytes, gpu=9.711 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#234:1):Float[256] (len=1024 bytes, gpu=0.636 ms, cpu=0.062 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#235:1):Long[] (len=8 bytes, gpu=0.536 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#236:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=0.659 ms, cpu=0.058 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#237:1):Float[1024] (len=4.00 KB, gpu=0.685 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#238:1):Float[1024] (len=4.00 KB, gpu=0.611 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#239:1):Float[1024] (len=4.00 KB, gpu=0.653 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#240:1):Float[1024] (len=4.00 KB, gpu=7.686 ms, cpu=0.021 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#241:1):Long[] (len=8 bytes, gpu=2.623 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#242:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=0.174 ms, cpu=0.284 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#243:1):Float[256] (len=1024 bytes, gpu=0.414 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#244:1):Float[256] (len=1024 bytes, gpu=0.574 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#245:1):Float[256] (len=1024 bytes, gpu=0.569 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#246:1):Float[256] (len=1024 bytes, gpu=0.556 ms, cpu=0.018 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#247:1):Long[] (len=8 bytes, gpu=6.564 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#248:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.445 ms, cpu=0.102 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#249:1):Float[256] (len=1024 bytes, gpu=7.385 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#250:1):Float[256] (len=1024 bytes, gpu=2.252 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#251:1):Float[256] (len=1024 bytes, gpu=0.703 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#252:1):Float[256] (len=1024 bytes, gpu=0.736 ms, cpu=0.020 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#253:1):Long[] (len=8 bytes, gpu=0.702 ms, cpu=0.016 ms)
BlitCopySync: CPU:Float[1024, 256, 1, 1] --> MPS(buf#254:1):Float[1024, 256, 1, 1] (len=1024.00 KB, gpu=0.726 ms, cpu=0.049 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#255:1):Float[1024] (len=4.00 KB, gpu=0.440 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#256:1):Float[1024] (len=4.00 KB, gpu=7.702 ms, cpu=0.023 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#257:1):Float[1024] (len=4.00 KB, gpu=7.617 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#258:1):Float[1024] (len=4.00 KB, gpu=9.519 ms, cpu=0.018 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#259:1):Long[] (len=8 bytes, gpu=0.256 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[512, 1024, 1, 1] --> MPS(buf#260:1):Float[512, 1024, 1, 1] (len=2.00 MB, gpu=0.566 ms, cpu=0.083 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#261:1):Float[512] (len=2.00 KB, gpu=0.360 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#262:1):Float[512] (len=2.00 KB, gpu=0.760 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#263:1):Float[512] (len=2.00 KB, gpu=0.744 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#264:1):Float[512] (len=2.00 KB, gpu=0.787 ms, cpu=0.015 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#265:1):Long[] (len=8 bytes, gpu=1.759 ms, cpu=0.014 ms)
BlitCopySync: CPU:Float[512, 512, 3, 3] --> MPS(buf#266:1):Float[512, 512, 3, 3] (len=9.00 MB, gpu=0.131 ms, cpu=0.245 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#267:1):Float[512] (len=2.00 KB, gpu=0.685 ms, cpu=0.042 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#268:1):Float[512] (len=2.00 KB, gpu=0.706 ms, cpu=0.023 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#269:1):Float[512] (len=2.00 KB, gpu=0.760 ms, cpu=0.037 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#270:1):Float[512] (len=2.00 KB, gpu=0.687 ms, cpu=0.018 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#271:1):Long[] (len=8 bytes, gpu=0.722 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[2048, 512, 1, 1] --> MPS(buf#272:1):Float[2048, 512, 1, 1] (len=4.00 MB, gpu=2.417 ms, cpu=0.131 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#273:1):Float[2048] (len=8.00 KB, gpu=0.383 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#274:1):Float[2048] (len=8.00 KB, gpu=0.715 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#275:1):Float[2048] (len=8.00 KB, gpu=0.698 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#276:1):Float[2048] (len=8.00 KB, gpu=0.684 ms, cpu=0.024 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#277:1):Long[] (len=8 bytes, gpu=0.746 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[2048, 1024, 1, 1] --> MPS(buf#278:1):Float[2048, 1024, 1, 1] (len=8.00 MB, gpu=8.735 ms, cpu=1.013 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#279:1):Float[2048] (len=8.00 KB, gpu=0.440 ms, cpu=0.036 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#280:1):Float[2048] (len=8.00 KB, gpu=0.414 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#281:1):Float[2048] (len=8.00 KB, gpu=0.685 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#282:1):Float[2048] (len=8.00 KB, gpu=0.744 ms, cpu=0.018 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#283:1):Long[] (len=8 bytes, gpu=0.709 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[512, 2048, 1, 1] --> MPS(buf#284:1):Float[512, 2048, 1, 1] (len=4.00 MB, gpu=0.679 ms, cpu=0.114 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#285:1):Float[512] (len=2.00 KB, gpu=7.509 ms, cpu=0.026 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#286:1):Float[512] (len=2.00 KB, gpu=7.514 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#287:1):Float[512] (len=2.00 KB, gpu=7.542 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#288:1):Float[512] (len=2.00 KB, gpu=9.232 ms, cpu=0.018 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#289:1):Long[] (len=8 bytes, gpu=0.522 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[512, 512, 3, 3] --> MPS(buf#290:1):Float[512, 512, 3, 3] (len=9.00 MB, gpu=0.372 ms, cpu=0.233 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#291:1):Float[512] (len=2.00 KB, gpu=0.693 ms, cpu=0.039 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#292:1):Float[512] (len=2.00 KB, gpu=0.479 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#293:1):Float[512] (len=2.00 KB, gpu=0.793 ms, cpu=0.016 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#294:1):Float[512] (len=2.00 KB, gpu=0.700 ms, cpu=0.017 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#295:1):Long[] (len=8 bytes, gpu=7.773 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[2048, 512, 1, 1] --> MPS(buf#296:1):Float[2048, 512, 1, 1] (len=4.00 MB, gpu=2.450 ms, cpu=0.125 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#297:1):Float[2048] (len=8.00 KB, gpu=0.186 ms, cpu=0.029 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#298:1):Float[2048] (len=8.00 KB, gpu=0.576 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#299:1):Float[2048] (len=8.00 KB, gpu=0.792 ms, cpu=0.016 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#300:1):Float[2048] (len=8.00 KB, gpu=0.740 ms, cpu=0.018 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#301:1):Long[] (len=8 bytes, gpu=0.696 ms, cpu=0.016 ms)
BlitCopySync: CPU:Float[512, 2048, 1, 1] --> MPS(buf#302:1):Float[512, 2048, 1, 1] (len=4.00 MB, gpu=6.677 ms, cpu=0.112 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#303:1):Float[512] (len=2.00 KB, gpu=0.359 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#304:1):Float[512] (len=2.00 KB, gpu=1.798 ms, cpu=0.015 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#305:1):Float[512] (len=2.00 KB, gpu=0.400 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#306:1):Float[512] (len=2.00 KB, gpu=0.694 ms, cpu=0.016 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#307:1):Long[] (len=8 bytes, gpu=0.735 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[512, 512, 3, 3] --> MPS(buf#308:1):Float[512, 512, 3, 3] (len=9.00 MB, gpu=0.604 ms, cpu=1.043 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#309:1):Float[512] (len=2.00 KB, gpu=0.678 ms, cpu=0.059 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#310:1):Float[512] (len=2.00 KB, gpu=9.797 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#311:1):Float[512] (len=2.00 KB, gpu=0.537 ms, cpu=0.017 ms)
BlitCopySync: CPU:Float[512] --> MPS(buf#312:1):Float[512] (len=2.00 KB, gpu=0.791 ms, cpu=0.017 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#313:1):Long[] (len=8 bytes, gpu=0.695 ms, cpu=0.016 ms)
BlitCopySync: CPU:Float[2048, 512, 1, 1] --> MPS(buf#314:1):Float[2048, 512, 1, 1] (len=4.00 MB, gpu=0.497 ms, cpu=0.102 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#315:1):Float[2048] (len=8.00 KB, gpu=0.424 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#316:1):Float[2048] (len=8.00 KB, gpu=0.806 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#317:1):Float[2048] (len=8.00 KB, gpu=7.788 ms, cpu=0.016 ms)
BlitCopySync: CPU:Float[2048] --> MPS(buf#318:1):Float[2048] (len=8.00 KB, gpu=9.570 ms, cpu=0.022 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#319:1):Long[] (len=8 bytes, gpu=0.403 ms, cpu=0.028 ms)
BlitCopySync: CPU:Float[256, 256, 1, 1] --> MPS(buf#320:1):Float[256, 256, 1, 1] (len=256.00 KB, gpu=0.676 ms, cpu=0.038 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#321:1):Float[256] (len=1024 bytes, gpu=0.463 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#322:1):Float[256] (len=1024 bytes, gpu=0.571 ms, cpu=0.026 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#323:1):Float[256] (len=1024 bytes, gpu=0.568 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#324:1):Float[256] (len=1024 bytes, gpu=0.697 ms, cpu=0.021 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#325:1):Long[] (len=8 bytes, gpu=6.462 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[256, 512, 1, 1] --> MPS(buf#326:1):Float[256, 512, 1, 1] (len=512.00 KB, gpu=0.495 ms, cpu=0.038 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#327:1):Float[256] (len=1024 bytes, gpu=4.092 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#328:1):Float[256] (len=1024 bytes, gpu=0.593 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#329:1):Float[256] (len=1024 bytes, gpu=0.636 ms, cpu=0.033 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#330:1):Float[256] (len=1024 bytes, gpu=0.651 ms, cpu=0.024 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#331:1):Long[] (len=8 bytes, gpu=7.767 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[256, 1024, 1, 1] --> MPS(buf#332:1):Float[256, 1024, 1, 1] (len=1024.00 KB, gpu=7.486 ms, cpu=0.065 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#333:1):Float[256] (len=1024 bytes, gpu=2.360 ms, cpu=0.044 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#334:1):Float[256] (len=1024 bytes, gpu=0.492 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#335:1):Float[256] (len=1024 bytes, gpu=0.777 ms, cpu=0.023 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#336:1):Float[256] (len=1024 bytes, gpu=0.742 ms, cpu=0.021 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#337:1):Long[] (len=8 bytes, gpu=0.749 ms, cpu=0.021 ms)
BlitCopySync: CPU:Float[256, 2048, 1, 1] --> MPS(buf#338:1):Float[256, 2048, 1, 1] (len=2.00 MB, gpu=0.429 ms, cpu=0.098 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#339:1):Float[256] (len=1024 bytes, gpu=7.706 ms, cpu=0.034 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#340:1):Float[256] (len=1024 bytes, gpu=7.543 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#341:1):Float[256] (len=1024 bytes, gpu=9.501 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#342:1):Float[256] (len=1024 bytes, gpu=0.381 ms, cpu=0.021 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#343:1):Long[] (len=8 bytes, gpu=0.682 ms, cpu=0.028 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#344:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.638 ms, cpu=0.110 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#345:1):Float[256] (len=1024 bytes, gpu=0.588 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#346:1):Float[256] (len=1024 bytes, gpu=0.774 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#347:1):Float[256] (len=1024 bytes, gpu=0.741 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#348:1):Float[256] (len=1024 bytes, gpu=1.766 ms, cpu=0.029 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#349:1):Long[] (len=8 bytes, gpu=0.545 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#350:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.685 ms, cpu=0.111 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#351:1):Float[256] (len=1024 bytes, gpu=0.621 ms, cpu=0.036 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#352:1):Float[256] (len=1024 bytes, gpu=0.666 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#353:1):Float[256] (len=1024 bytes, gpu=0.839 ms, cpu=0.023 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#354:1):Float[256] (len=1024 bytes, gpu=0.469 ms, cpu=0.022 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#355:1):Long[] (len=8 bytes, gpu=2.760 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#356:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.387 ms, cpu=0.096 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#357:1):Float[256] (len=1024 bytes, gpu=0.433 ms, cpu=0.083 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#358:1):Float[256] (len=1024 bytes, gpu=0.712 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#359:1):Float[256] (len=1024 bytes, gpu=0.662 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#360:1):Float[256] (len=1024 bytes, gpu=0.774 ms, cpu=0.018 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#361:1):Long[] (len=8 bytes, gpu=1.598 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#362:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.399 ms, cpu=0.131 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#363:1):Float[256] (len=1024 bytes, gpu=0.397 ms, cpu=0.023 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#364:1):Float[256] (len=1024 bytes, gpu=0.757 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#365:1):Float[256] (len=1024 bytes, gpu=0.730 ms, cpu=0.036 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#366:1):Float[256] (len=1024 bytes, gpu=0.702 ms, cpu=0.027 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#367:1):Long[] (len=8 bytes, gpu=0.715 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#368:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=1.639 ms, cpu=0.123 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#369:1):Float[256] (len=1024 bytes, gpu=0.225 ms, cpu=0.023 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#370:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.664 ms, cpu=0.110 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#371:1):Float[256] (len=1024 bytes, gpu=0.629 ms, cpu=0.037 ms)
BlitCopySync: CPU:Float[3, 256, 1, 1] --> MPS(buf#372:1):Float[3, 256, 1, 1] (len=3.00 KB, gpu=0.738 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[3] --> MPS(buf#373:1):Float[3] (len=12 bytes, gpu=0.644 ms, cpu=0.018 ms)
BlitCopySync: CPU:Float[12, 256, 1, 1] --> MPS(buf#374:1):Float[12, 256, 1, 1] (len=12.00 KB, gpu=0.657 ms, cpu=0.028 ms)
BlitCopySync: CPU:Float[12] --> MPS(buf#375:1):Float[12] (len=48 bytes, gpu=7.596 ms, cpu=0.027 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#376:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=7.316 ms, cpu=0.125 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#377:1):Float[256] (len=1024 bytes, gpu=5.053 ms, cpu=0.035 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#378:1):Float[256] (len=1024 bytes, gpu=0.536 ms, cpu=0.030 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#379:1):Float[256] (len=1024 bytes, gpu=0.690 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#380:1):Float[256] (len=1024 bytes, gpu=9.658 ms, cpu=0.021 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#381:1):Long[] (len=8 bytes, gpu=0.481 ms, cpu=0.024 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#382:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=0.463 ms, cpu=0.121 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#383:1):Float[256] (len=1024 bytes, gpu=0.466 ms, cpu=0.022 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#384:1):Float[256] (len=1024 bytes, gpu=0.750 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#385:1):Float[256] (len=1024 bytes, gpu=0.744 ms, cpu=0.019 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#386:1):Float[256] (len=1024 bytes, gpu=0.776 ms, cpu=0.024 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#387:1):Long[] (len=8 bytes, gpu=7.653 ms, cpu=0.028 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#388:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=8.365 ms, cpu=0.967 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#389:1):Float[256] (len=1024 bytes, gpu=0.820 ms, cpu=0.045 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#390:1):Float[256] (len=1024 bytes, gpu=0.457 ms, cpu=0.031 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#391:1):Float[256] (len=1024 bytes, gpu=0.750 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#392:1):Float[256] (len=1024 bytes, gpu=0.725 ms, cpu=0.025 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#393:1):Long[] (len=8 bytes, gpu=0.639 ms, cpu=0.026 ms)
BlitCopySync: CPU:Float[256, 256, 3, 3] --> MPS(buf#394:1):Float[256, 256, 3, 3] (len=2.25 MB, gpu=6.672 ms, cpu=0.120 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#395:1):Float[256] (len=1024 bytes, gpu=0.069 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#396:1):Float[256] (len=1024 bytes, gpu=5.768 ms, cpu=0.025 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#397:1):Float[256] (len=1024 bytes, gpu=0.433 ms, cpu=0.143 ms)
BlitCopySync: CPU:Float[256] --> MPS(buf#398:1):Float[256] (len=1024 bytes, gpu=0.655 ms, cpu=0.019 ms)
BlitCopySync: CPU:Long[] --> MPS(buf#399:1):Long[] (len=8 bytes, gpu=7.619 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[1024, 12544] --> MPS(buf#400:1):Float[1024, 12544] (len=49.00 MB, gpu=1.695 ms, cpu=41.862 ms)
BlitCopySync: CPU:Float[1024] --> MPS(buf#401:1):Float[1024] (len=4.00 KB, gpu=0.063 ms, cpu=0.175 ms)
BlitCopySync: CPU:Float[91, 1024] --> MPS(buf#402:1):Float[91, 1024] (len=364.00 KB, gpu=8.687 ms, cpu=0.028 ms)
BlitCopySync: CPU:Float[91] --> MPS(buf#403:1):Float[91] (len=364 bytes, gpu=5.159 ms, cpu=0.028 ms)
BlitCopySync: CPU:Float[364, 1024] --> MPS(buf#404:1):Float[364, 1024] (len=1.42 MB, gpu=0.518 ms, cpu=0.083 ms)
BlitCopySync: CPU:Float[364] --> MPS(buf#405:1):Float[364] (len=1.42 KB, gpu=0.572 ms, cpu=0.026 ms)
BlitCopySync: CPU:Float[3] --> MPS(buf#406:1):Float[3] (len=12 bytes, gpu=8.286 ms, cpu=0.020 ms)
BlitCopySync: CPU:Float[3] --> MPS(buf#407:1):Float[3] (len=12 bytes, gpu=0.410 ms, cpu=0.034 ms)
aten::sub_out_mps::f32[3,224,224]:f32[3,1,1]:f32[3,224,224] (id=G1, run=1, gpu=6.218 ms, cpu=0.960 ms)
aten::div_out_mps::f32[3,224,224]:f32[3,1,1]:f32[3,224,224] (id=G2, run=1, gpu=6.218 ms, cpu=0.960 ms)
aten::upsample_bilinear:f32[1,3,224,224]:[1.000000,0.000000]:[Undefined] (id=G3, run=1, gpu=12.159 ms, cpu=0.074 ms)
BlitCopy: MPS(buf#410:2):Float[3, 800, 800] --> MPS(buf#411:2):Float[3, 800, 800] (len=7.32 MB, gpu=12.159 ms, cpu=0.074 ms)
aten::mps_convolution:2:2:1:1:3:3:1:Contiguous:f32[1,3,800,800]:f32[64,3,7,7]:0:nobias (id=G4, run=1, gpu=12.159 ms, cpu=0.074 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,400,400::f32[1,64,400,400]:f32[64]:f32[64]:f32[64]:f32[64] (id=G5, run=1, gpu=24.210 ms, cpu=0.056 ms)
aten::relu_:f32[1,64,400,400] (id=G6, run=1, gpu=24.210 ms, cpu=0.056 ms)
aten::max_pool2d:f32[1,64,400,400]:Undefined:Undefined:K[3,3,]:S[2,2,]:P[1,1,]:D[1,1,]:NCHW (id=G7, run=1, gpu=24.210 ms, cpu=0.056 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[64,64,1,1]:0:nobias (id=G8, run=1, gpu=24.210 ms, cpu=0.056 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=29.987 ms, cpu=0.021 ms)
aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=29.987 ms, cpu=0.021 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,64,200,200]:f32[64,64,3,3]:0:nobias (id=G11, run=3, gpu=29.987 ms, cpu=0.021 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=29.987 ms, cpu=0.021 ms)
aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=29.987 ms, cpu=0.021 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[256,64,1,1]:0:nobias (id=G12, run=4, gpu=16.006 ms, cpu=0.057 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=4, gpu=16.006 ms, cpu=0.057 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[256,64,1,1]:0:nobias (id=G12, run=4, gpu=16.006 ms, cpu=0.057 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=4, gpu=15.083 ms, cpu=0.054 ms)
aten::add_out_mps::f32[1,256,200,200]:f32[1,256,200,200]:f32[1,256,200,200] (id=G14, run=3, gpu=15.083 ms, cpu=0.054 ms)
aten::relu_:f32[1,256,200,200] (id=G15, run=3, gpu=15.083 ms, cpu=0.054 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[64,256,1,1]:0:nobias (id=G16, run=2, gpu=15.083 ms, cpu=0.054 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=15.083 ms, cpu=0.054 ms)
aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=3.389 ms, cpu=0.032 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,64,200,200]:f32[64,64,3,3]:0:nobias (id=G11, run=3, gpu=3.389 ms, cpu=0.032 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=3.389 ms, cpu=0.032 ms)
aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=3.389 ms, cpu=0.032 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[256,64,1,1]:0:nobias (id=G12, run=4, gpu=3.389 ms, cpu=0.032 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=4, gpu=22.587 ms, cpu=0.031 ms)
aten::add_out_mps::f32[1,256,200,200]:f32[1,256,200,200]:f32[1,256,200,200] (id=G14, run=3, gpu=22.587 ms, cpu=0.031 ms)
aten::relu_:f32[1,256,200,200] (id=G15, run=3, gpu=14.885 ms, cpu=0.025 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[64,256,1,1]:0:nobias (id=G16, run=2, gpu=14.885 ms, cpu=0.025 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=14.885 ms, cpu=0.025 ms)
aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=14.885 ms, cpu=0.025 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,64,200,200]:f32[64,64,3,3]:0:nobias (id=G11, run=3, gpu=14.885 ms, cpu=0.025 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,64,200,200::f32[1,64,200,200]:f32[64]:f32[64]:f32[64]:f32[64] (id=G9, run=6, gpu=15.096 ms, cpu=0.032 ms)
aten::relu_:f32[1,64,200,200] (id=G10, run=6, gpu=15.096 ms, cpu=0.032 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,64,200,200]:f32[256,64,1,1]:0:nobias (id=G12, run=4, gpu=15.096 ms, cpu=0.032 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=4, gpu=15.096 ms, cpu=0.032 ms)
aten::add_out_mps::f32[1,256,200,200]:f32[1,256,200,200]:f32[1,256,200,200] (id=G14, run=3, gpu=17.229 ms, cpu=0.111 ms)
aten::relu_:f32[1,256,200,200] (id=G15, run=3, gpu=17.229 ms, cpu=0.111 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[128,256,1,1]:0:nobias (id=G17, run=1, gpu=17.229 ms, cpu=0.111 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,200,200::f32[1,128,200,200]:f32[128]:f32[128]:f32[128]:f32[128] (id=G18, run=1, gpu=17.229 ms, cpu=0.111 ms)
aten::relu_:f32[1,128,200,200] (id=G19, run=1, gpu=17.229 ms, cpu=0.111 ms)
aten::mps_convolution:2:2:1:1:1:1:1:Contiguous:f32[1,128,200,200]:f32[128,128,3,3]:0:nobias (id=G20, run=1, gpu=1.688 ms, cpu=0.019 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=1.688 ms, cpu=0.019 ms)
aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=1.688 ms, cpu=0.019 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,128,100,100]:f32[512,128,1,1]:0:nobias (id=G23, run=4, gpu=1.688 ms, cpu=0.019 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=29.196 ms, cpu=0.103 ms)
aten::mps_convolution:2:2:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[512,256,1,1]:0:nobias (id=G25, run=1, gpu=29.196 ms, cpu=0.103 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=29.196 ms, cpu=0.103 ms)
aten::add_out_mps::f32[1,512,100,100]:f32[1,512,100,100]:f32[1,512,100,100] (id=G26, run=4, gpu=29.196 ms, cpu=0.103 ms)
aten::relu_:f32[1,512,100,100] (id=G27, run=4, gpu=29.196 ms, cpu=0.103 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[128,512,1,1]:0:nobias (id=G28, run=3, gpu=29.196 ms, cpu=0.103 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=29.196 ms, cpu=0.103 ms)
aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=29.196 ms, cpu=0.103 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,128,100,100]:f32[128,128,3,3]:0:nobias (id=G29, run=3, gpu=1.820 ms, cpu=0.035 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=1.820 ms, cpu=0.035 ms)
aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=1.820 ms, cpu=0.035 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,128,100,100]:f32[512,128,1,1]:0:nobias (id=G23, run=4, gpu=1.820 ms, cpu=0.035 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=23.996 ms, cpu=0.038 ms)
aten::add_out_mps::f32[1,512,100,100]:f32[1,512,100,100]:f32[1,512,100,100] (id=G26, run=4, gpu=23.996 ms, cpu=0.038 ms)
aten::relu_:f32[1,512,100,100] (id=G27, run=4, gpu=23.996 ms, cpu=0.038 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[128,512,1,1]:0:nobias (id=G28, run=3, gpu=23.996 ms, cpu=0.038 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=23.996 ms, cpu=0.038 ms)
aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=23.996 ms, cpu=0.038 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,128,100,100]:f32[128,128,3,3]:0:nobias (id=G29, run=3, gpu=23.996 ms, cpu=0.038 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=23.996 ms, cpu=0.038 ms)
aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=9.144 ms, cpu=0.025 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,128,100,100]:f32[512,128,1,1]:0:nobias (id=G23, run=4, gpu=9.144 ms, cpu=0.025 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=9.144 ms, cpu=0.025 ms)
aten::add_out_mps::f32[1,512,100,100]:f32[1,512,100,100]:f32[1,512,100,100] (id=G26, run=4, gpu=9.144 ms, cpu=0.025 ms)
aten::relu_:f32[1,512,100,100] (id=G27, run=4, gpu=9.144 ms, cpu=0.025 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[128,512,1,1]:0:nobias (id=G28, run=3, gpu=6.830 ms, cpu=0.028 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=6.830 ms, cpu=0.028 ms)
aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=6.830 ms, cpu=0.028 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,128,100,100]:f32[128,128,3,3]:0:nobias (id=G29, run=3, gpu=6.830 ms, cpu=0.028 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,128,100,100::f32[1,128,100,100]:f32[128]:f32[128]:f32[128]:f32[128] (id=G21, run=7, gpu=16.042 ms, cpu=0.065 ms)
aten::relu_:f32[1,128,100,100] (id=G22, run=7, gpu=16.042 ms, cpu=0.065 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,128,100,100]:f32[512,128,1,1]:0:nobias (id=G23, run=4, gpu=16.042 ms, cpu=0.065 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,100,100::f32[1,512,100,100]:f32[512]:f32[512]:f32[512]:f32[512] (id=G24, run=5, gpu=16.042 ms, cpu=0.065 ms)
aten::add_out_mps::f32[1,512,100,100]:f32[1,512,100,100]:f32[1,512,100,100] (id=G26, run=4, gpu=16.042 ms, cpu=0.065 ms)
aten::relu_:f32[1,512,100,100] (id=G27, run=4, gpu=16.042 ms, cpu=0.065 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[256,512,1,1]:0:nobias (id=G30, run=1, gpu=16.042 ms, cpu=0.065 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,100,100::f32[1,256,100,100]:f32[256]:f32[256]:f32[256]:f32[256] (id=G31, run=1, gpu=16.042 ms, cpu=0.065 ms)
aten::relu_:f32[1,256,100,100] (id=G32, run=1, gpu=6.952 ms, cpu=0.221 ms)
aten::mps_convolution:2:2:1:1:1:1:1:Contiguous:f32[1,256,100,100]:f32[256,256,3,3]:0:nobias (id=G33, run=1, gpu=6.952 ms, cpu=0.221 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=6.952 ms, cpu=0.221 ms)
aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=6.952 ms, cpu=0.221 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=6.952 ms, cpu=0.221 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=9.026 ms, cpu=0.035 ms)
aten::mps_convolution:2:2:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[1024,512,1,1]:0:nobias (id=G38, run=1, gpu=9.026 ms, cpu=0.035 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=9.026 ms, cpu=0.035 ms)
aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=9.026 ms, cpu=0.035 ms)
aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=9.026 ms, cpu=0.035 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=9.026 ms, cpu=0.035 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=9.026 ms, cpu=0.035 ms)
aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=7.971 ms, cpu=0.042 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=7.971 ms, cpu=0.042 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=7.971 ms, cpu=0.042 ms)
aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=7.971 ms, cpu=0.042 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=7.971 ms, cpu=0.042 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=9.026 ms, cpu=0.037 ms)
aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=9.026 ms, cpu=0.037 ms)
aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=9.026 ms, cpu=0.037 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=9.026 ms, cpu=0.037 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=9.026 ms, cpu=0.037 ms)
aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=9.026 ms, cpu=0.037 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=9.026 ms, cpu=0.037 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=8.120 ms, cpu=0.028 ms)
aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=8.120 ms, cpu=0.028 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=8.120 ms, cpu=0.028 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=8.120 ms, cpu=0.028 ms)
aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=8.120 ms, cpu=0.028 ms)
aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=6.851 ms, cpu=0.028 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=6.851 ms, cpu=0.028 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=6.851 ms, cpu=0.028 ms)
aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=6.851 ms, cpu=0.028 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=6.851 ms, cpu=0.028 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=8.015 ms, cpu=0.033 ms)
aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=8.015 ms, cpu=0.033 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=8.015 ms, cpu=0.033 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=8.015 ms, cpu=0.033 ms)
aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=8.015 ms, cpu=0.033 ms)
aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=8.015 ms, cpu=0.033 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=8.015 ms, cpu=0.033 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=4.187 ms, cpu=0.028 ms)
aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=4.187 ms, cpu=0.028 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=4.187 ms, cpu=0.028 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=4.187 ms, cpu=0.028 ms)
aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=4.187 ms, cpu=0.028 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=3.803 ms, cpu=0.038 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=3.803 ms, cpu=0.038 ms)
aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=3.803 ms, cpu=0.038 ms)
aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=3.803 ms, cpu=0.038 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=3.803 ms, cpu=0.038 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=5.019 ms, cpu=0.024 ms)
aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=5.019 ms, cpu=0.024 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=5, gpu=5.019 ms, cpu=0.024 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=5.019 ms, cpu=0.024 ms)
aten::relu_:f32[1,256,50,50] (id=G35, run=11, gpu=5.019 ms, cpu=0.024 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[1024,256,1,1]:0:nobias (id=G36, run=6, gpu=5.019 ms, cpu=0.024 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,1024,50,50::f32[1,1024,50,50]:f32[1024]:f32[1024]:f32[1024]:f32[1024] (id=G37, run=7, gpu=5.019 ms, cpu=0.024 ms)
aten::add_out_mps::f32[1,1024,50,50]:f32[1,1024,50,50]:f32[1,1024,50,50] (id=G39, run=6, gpu=3.451 ms, cpu=0.431 ms)
aten::relu_:f32[1,1024,50,50] (id=G40, run=6, gpu=3.451 ms, cpu=0.431 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[512,1024,1,1]:0:nobias (id=G43, run=1, gpu=3.451 ms, cpu=0.431 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,50,50::f32[1,512,50,50]:f32[512]:f32[512]:f32[512]:f32[512] (id=G44, run=1, gpu=3.451 ms, cpu=0.431 ms)
aten::relu_:f32[1,512,50,50] (id=G45, run=1, gpu=3.451 ms, cpu=0.431 ms)
aten::mps_convolution:2:2:1:1:1:1:1:Contiguous:f32[1,512,50,50]:f32[512,512,3,3]:0:nobias (id=G46, run=1, gpu=6.531 ms, cpu=0.052 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=6.531 ms, cpu=0.052 ms)
aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=6.531 ms, cpu=0.052 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,25,25]:f32[2048,512,1,1]:0:nobias (id=G49, run=3, gpu=6.531 ms, cpu=0.052 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,2048,25,25::f32[1,2048,25,25]:f32[2048]:f32[2048]:f32[2048]:f32[2048] (id=G50, run=4, gpu=2.361 ms, cpu=0.048 ms)
aten::mps_convolution:2:2:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[2048,1024,1,1]:0:nobias (id=G51, run=1, gpu=2.361 ms, cpu=0.048 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,2048,25,25::f32[1,2048,25,25]:f32[2048]:f32[2048]:f32[2048]:f32[2048] (id=G50, run=4, gpu=2.361 ms, cpu=0.048 ms)
aten::add_out_mps::f32[1,2048,25,25]:f32[1,2048,25,25]:f32[1,2048,25,25] (id=G52, run=3, gpu=2.361 ms, cpu=0.048 ms)
aten::relu_:f32[1,2048,25,25] (id=G53, run=3, gpu=2.361 ms, cpu=0.048 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,2048,25,25]:f32[512,2048,1,1]:0:nobias (id=G54, run=2, gpu=2.361 ms, cpu=0.048 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=2.361 ms, cpu=0.048 ms)
aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=2.361 ms, cpu=0.048 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,512,25,25]:f32[512,512,3,3]:0:nobias (id=G55, run=2, gpu=5.060 ms, cpu=0.051 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=5.060 ms, cpu=0.051 ms)
aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=5.060 ms, cpu=0.051 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,25,25]:f32[2048,512,1,1]:0:nobias (id=G49, run=3, gpu=5.060 ms, cpu=0.051 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,2048,25,25::f32[1,2048,25,25]:f32[2048]:f32[2048]:f32[2048]:f32[2048] (id=G50, run=4, gpu=2.594 ms, cpu=0.036 ms)
aten::add_out_mps::f32[1,2048,25,25]:f32[1,2048,25,25]:f32[1,2048,25,25] (id=G52, run=3, gpu=2.594 ms, cpu=0.036 ms)
aten::relu_:f32[1,2048,25,25] (id=G53, run=3, gpu=2.594 ms, cpu=0.036 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,2048,25,25]:f32[512,2048,1,1]:0:nobias (id=G54, run=2, gpu=2.594 ms, cpu=0.036 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=2.594 ms, cpu=0.036 ms)
aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=2.594 ms, cpu=0.036 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,512,25,25]:f32[512,512,3,3]:0:nobias (id=G55, run=2, gpu=2.594 ms, cpu=0.036 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,512,25,25::f32[1,512,25,25]:f32[512]:f32[512]:f32[512]:f32[512] (id=G47, run=5, gpu=2.594 ms, cpu=0.036 ms)
aten::relu_:f32[1,512,25,25] (id=G48, run=5, gpu=8.103 ms, cpu=0.061 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,25,25]:f32[2048,512,1,1]:0:nobias (id=G49, run=3, gpu=8.103 ms, cpu=0.061 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,2048,25,25::f32[1,2048,25,25]:f32[2048]:f32[2048]:f32[2048]:f32[2048] (id=G50, run=4, gpu=8.103 ms, cpu=0.061 ms)
aten::add_out_mps::f32[1,2048,25,25]:f32[1,2048,25,25]:f32[1,2048,25,25] (id=G52, run=3, gpu=8.103 ms, cpu=0.061 ms)
aten::relu_:f32[1,2048,25,25] (id=G53, run=3, gpu=8.103 ms, cpu=0.061 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,2048,25,25]:f32[256,2048,1,1]:0:nobias (id=G56, run=1, gpu=5.217 ms, cpu=0.082 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,25,25::f32[1,256,25,25]:f32[256]:f32[256]:f32[256]:f32[256] (id=G57, run=2, gpu=5.217 ms, cpu=0.082 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,25,25]:f32[256,256,3,3]:0:nobias (id=G58, run=1, gpu=5.217 ms, cpu=0.082 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,25,25::f32[1,256,25,25]:f32[256]:f32[256]:f32[256]:f32[256] (id=G57, run=2, gpu=1.692 ms, cpu=0.019 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,1024,50,50]:f32[256,1024,1,1]:0:nobias (id=G41, run=6, gpu=1.692 ms, cpu=0.019 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=12, gpu=1.692 ms, cpu=0.019 ms)
aten::upsample_nearest:f32[1,256,25,25]:[1.000000,0.000000]:[Undefined] (id=G59, run=1, gpu=5.205 ms, cpu=0.060 ms)
aten::add_out_mps::f32[1,256,50,50]:f32[1,256,50,50]:f32[1,256,50,50] (id=G60, run=1, gpu=2.894 ms, cpu=0.022 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:0:nobias (id=G42, run=6, gpu=2.894 ms, cpu=0.022 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,50,50::f32[1,256,50,50]:f32[256]:f32[256]:f32[256]:f32[256] (id=G34, run=13, gpu=2.894 ms, cpu=0.022 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,512,100,100]:f32[256,512,1,1]:0:nobias (id=G30, run=2, gpu=2.894 ms, cpu=0.022 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,100,100::f32[1,256,100,100]:f32[256]:f32[256]:f32[256]:f32[256] (id=G31, run=2, gpu=0.072 ms, cpu=0.028 ms)
aten::upsample_nearest:f32[1,256,50,50]:[1.000000,0.000000]:[Undefined] (id=G61, run=1, gpu=22.749 ms, cpu=0.025 ms)
aten::add_out_mps::f32[1,256,100,100]:f32[1,256,100,100]:f32[1,256,100,100] (id=G62, run=1, gpu=22.749 ms, cpu=0.025 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,100,100]:f32[256,256,3,3]:0:nobias (id=G63, run=1, gpu=22.749 ms, cpu=0.025 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,100,100::f32[1,256,100,100]:f32[256]:f32[256]:f32[256]:f32[256] (id=G31, run=3, gpu=22.749 ms, cpu=0.025 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[256,256,1,1]:0:nobias (id=G64, run=1, gpu=22.749 ms, cpu=0.025 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=5, gpu=0.029 ms, cpu=0.019 ms)
aten::upsample_nearest:f32[1,256,100,100]:[1.000000,0.000000]:[Undefined] (id=G65, run=1, gpu=33.052 ms, cpu=0.027 ms)
aten::add_out_mps::f32[1,256,200,200]:f32[1,256,200,200]:f32[1,256,200,200] (id=G14, run=4, gpu=33.052 ms, cpu=0.027 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,200,200]:f32[256,256,3,3]:0:nobias (id=G66, run=1, gpu=36.281 ms, cpu=0.253 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1,256,200,200::f32[1,256,200,200]:f32[256]:f32[256]:f32[256]:f32[256] (id=G13, run=6, gpu=36.281 ms, cpu=0.253 ms)
aten::max_pool2d:f32[1,256,25,25]:Undefined:Undefined:K[1,1,]:S[2,2,]:P[0,0,]:D[1,1,]:NCHW (id=G67, run=1, gpu=36.281 ms, cpu=0.253 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,200,200]:f32[256,256,3,3]:1:256 (id=G68, run=2, gpu=36.281 ms, cpu=0.253 ms)
aten::relu_:f32[1,256,200,200] (id=G15, run=5, gpu=41.522 ms, cpu=0.021 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,200,200]:f32[256,256,3,3]:1:256 (id=G68, run=2, gpu=41.522 ms, cpu=0.021 ms)
aten::relu_:f32[1,256,200,200] (id=G15, run=5, gpu=41.522 ms, cpu=0.021 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[3,256,1,1]:1:3 (id=G69, run=1, gpu=41.522 ms, cpu=0.021 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,200,200]:f32[12,256,1,1]:1:12 (id=G70, run=1, gpu=41.522 ms, cpu=0.021 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,100,100]:f32[256,256,3,3]:1:256 (id=G71, run=2, gpu=41.522 ms, cpu=0.021 ms)
aten::relu_:f32[1,256,100,100] (id=G32, run=3, gpu=41.522 ms, cpu=0.021 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,100,100]:f32[256,256,3,3]:1:256 (id=G71, run=2, gpu=8.524 ms, cpu=0.031 ms)
aten::relu_:f32[1,256,100,100] (id=G32, run=3, gpu=8.524 ms, cpu=0.031 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,100,100]:f32[3,256,1,1]:1:3 (id=G72, run=1, gpu=8.524 ms, cpu=0.031 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,100,100]:f32[12,256,1,1]:1:12 (id=G73, run=1, gpu=8.524 ms, cpu=0.031 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:1:256 (id=G74, run=2, gpu=8.524 ms, cpu=0.031 ms)
aten::relu_:f32[1,256,50,50] (id=G35, run=13, gpu=8.524 ms, cpu=0.031 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,50,50]:f32[256,256,3,3]:1:256 (id=G74, run=2, gpu=8.524 ms, cpu=0.031 ms)
aten::relu_:f32[1,256,50,50] (id=G35, run=13, gpu=5.209 ms, cpu=0.051 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[3,256,1,1]:1:3 (id=G75, run=1, gpu=5.209 ms, cpu=0.051 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,50,50]:f32[12,256,1,1]:1:12 (id=G76, run=1, gpu=5.209 ms, cpu=0.051 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,25,25]:f32[256,256,3,3]:1:256 (id=G77, run=2, gpu=5.209 ms, cpu=0.051 ms)
aten::relu_:f32[1,256,25,25] (id=G78, run=2, gpu=5.209 ms, cpu=0.051 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,25,25]:f32[256,256,3,3]:1:256 (id=G77, run=2, gpu=5.209 ms, cpu=0.051 ms)
aten::relu_:f32[1,256,25,25] (id=G78, run=2, gpu=5.209 ms, cpu=0.051 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,25,25]:f32[3,256,1,1]:1:3 (id=G79, run=1, gpu=2.647 ms, cpu=0.022 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,25,25]:f32[12,256,1,1]:1:12 (id=G80, run=1, gpu=2.647 ms, cpu=0.022 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,13,13]:f32[256,256,3,3]:1:256 (id=G81, run=2, gpu=2.647 ms, cpu=0.022 ms)
aten::relu_:f32[1,256,13,13] (id=G82, run=2, gpu=2.647 ms, cpu=0.022 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1,256,13,13]:f32[256,256,3,3]:1:256 (id=G81, run=2, gpu=2.647 ms, cpu=0.022 ms)
aten::relu_:f32[1,256,13,13] (id=G82, run=2, gpu=2.647 ms, cpu=0.022 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,13,13]:f32[3,256,1,1]:1:3 (id=G83, run=1, gpu=2.647 ms, cpu=0.022 ms)
aten::mps_convolution:1:1:1:1:0:0:1:Contiguous:f32[1,256,13,13]:f32[12,256,1,1]:1:12 (id=G84, run=1, gpu=63.031 ms, cpu=0.520 ms)
aten::fill_scalar_mps_impl:i64[Scalar]:4.000000 (id=G85, run=2, gpu=63.031 ms, cpu=0.520 ms)
aten::fill_scalar_mps_impl:i64[Scalar]:4.000000 (id=G85, run=2, gpu=63.031 ms, cpu=0.520 ms)
aten::fill_scalar_mps_impl:i64[Scalar]:8.000000 (id=G86, run=2, gpu=63.031 ms, cpu=0.520 ms)
aten::fill_scalar_mps_impl:i64[Scalar]:8.000000 (id=G86, run=2, gpu=63.031 ms, cpu=0.520 ms)
aten::fill_scalar_mps_impl:i64[Scalar]:16.000000 (id=G87, run=2, gpu=63.031 ms, cpu=0.520 ms)
aten::fill_scalar_mps_impl:i64[Scalar]:16.000000 (id=G87, run=2, gpu=63.031 ms, cpu=0.520 ms)
aten::fill_scalar_mps_impl:i64[Scalar]:32.000000 (id=G88, run=2, gpu=63.031 ms, cpu=0.520 ms)
aten::fill_scalar_mps_impl:i64[Scalar]:32.000000 (id=G88, run=2, gpu=63.031 ms, cpu=0.520 ms)
aten::fill_scalar_mps_impl:i64[Scalar]:61.000000 (id=G89, run=2, gpu=63.031 ms, cpu=0.520 ms)
aten::fill_scalar_mps_impl:i64[Scalar]:61.000000 (id=G89, run=2, gpu=63.031 ms, cpu=0.520 ms)
BlitCopySync: CPU:Float[3, 4] --> MPS(buf#407:1):Float[3, 4] (len=48 bytes, gpu=63.031 ms, cpu=0.520 ms)
BlitCopySync: CPU:Float[3, 4] --> MPS(buf#406:1):Float[3, 4] (len=48 bytes, gpu=0.439 ms, cpu=0.038 ms)
BlitCopySync: CPU:Float[3, 4] --> MPS(buf#648:1):Float[3, 4] (len=48 bytes, gpu=1.716 ms, cpu=0.028 ms)
BlitCopySync: CPU:Float[3, 4] --> MPS(buf#649:1):Float[3, 4] (len=48 bytes, gpu=0.464 ms, cpu=0.030 ms)
BlitCopySync: CPU:Float[3, 4] --> MPS(buf#650:1):Float[3, 4] (len=48 bytes, gpu=0.686 ms, cpu=0.025 ms)
aten::arange_mps_out:i32[200]:200 (id=G90, run=2, gpu=1.004 ms, cpu=0.075 ms)
aten::mul:i32[200]:i64[Scalar]:i32[200] (id=G91, run=2, gpu=1.004 ms, cpu=0.075 ms)
aten::arange_mps_out:i32[200]:200 (id=G90, run=2, gpu=1.004 ms, cpu=0.075 ms)
aten::mul:i32[200]:i64[Scalar]:i32[200] (id=G91, run=2, gpu=1.004 ms, cpu=0.075 ms)
aten::gather_kernel_2:MPS(buf#653:2):Int[200, 200]:MPS(buf#655:1):Int[200, 200] (id=K1, run=2, gpu=1.004 ms, cpu=0.075 ms)
aten::gather_kernel_2:MPS(buf#653:2):Int[200, 200]:MPS(buf#655:1):Int[200, 200] (id=K1, run=2, gpu=1.004 ms, cpu=0.075 ms)
aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=1, gpu=1.004 ms, cpu=0.075 ms)
aten::add_out_mps::i32[40000,1,4]:f32[1,3,4]:f32[40000,3,4] (id=G93, run=1, gpu=1.004 ms, cpu=0.075 ms)
aten::arange_mps_out:i32[100]:100 (id=G94, run=2, gpu=1.668 ms, cpu=0.032 ms)
aten::mul:i32[100]:i64[Scalar]:i32[100] (id=G95, run=2, gpu=1.668 ms, cpu=0.032 ms)
aten::arange_mps_out:i32[100]:100 (id=G94, run=2, gpu=1.668 ms, cpu=0.032 ms)
aten::mul:i32[100]:i64[Scalar]:i32[100] (id=G95, run=2, gpu=1.668 ms, cpu=0.032 ms)
aten::gather_kernel_2:MPS(buf#652:3):Int[50, 50]:MPS(buf#654:2):Int[50, 50] (id=K1, run=6, gpu=1.668 ms, cpu=0.032 ms)
aten::gather_kernel_2:MPS(buf#652:3):Int[50, 50]:MPS(buf#654:2):Int[50, 50] (id=K1, run=6, gpu=1.668 ms, cpu=0.032 ms)
aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=3, gpu=1.668 ms, cpu=0.032 ms)
aten::add_out_mps::i32[10000,1,4]:f32[1,3,4]:f32[10000,3,4] (id=G96, run=1, gpu=1.668 ms, cpu=0.032 ms)
aten::arange_mps_out:i32[50]:50 (id=G97, run=2, gpu=1.668 ms, cpu=0.032 ms)
aten::mul:i32[50]:i64[Scalar]:i32[50] (id=G98, run=2, gpu=10.395 ms, cpu=0.052 ms)
aten::arange_mps_out:i32[50]:50 (id=G97, run=2, gpu=10.395 ms, cpu=0.052 ms)
aten::mul:i32[50]:i64[Scalar]:i32[50] (id=G98, run=2, gpu=10.395 ms, cpu=0.052 ms)
aten::gather_kernel_2:MPS(buf#651:2):Int[13, 13]:MPS(buf#655:2):Int[13, 13] (id=K1, run=10, gpu=10.395 ms, cpu=0.052 ms)
aten::gather_kernel_2:MPS(buf#651:2):Int[13, 13]:MPS(buf#655:2):Int[13, 13] (id=K1, run=10, gpu=10.395 ms, cpu=0.052 ms)
aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=5, gpu=10.395 ms, cpu=0.052 ms)
aten::add_out_mps::i32[2500,1,4]:f32[1,3,4]:f32[2500,3,4] (id=G99, run=1, gpu=10.395 ms, cpu=0.052 ms)
aten::arange_mps_out:i32[25]:25 (id=G100, run=2, gpu=10.395 ms, cpu=0.052 ms)
aten::mul:i32[25]:i64[Scalar]:i32[25] (id=G101, run=2, gpu=10.395 ms, cpu=0.052 ms)
aten::arange_mps_out:i32[25]:25 (id=G100, run=2, gpu=10.395 ms, cpu=0.052 ms)
aten::mul:i32[25]:i64[Scalar]:i32[25] (id=G101, run=2, gpu=10.395 ms, cpu=0.052 ms)
aten::gather_kernel_2:MPS(buf#651:2):Int[13, 13]:MPS(buf#655:2):Int[13, 13] (id=K1, run=10, gpu=10.395 ms, cpu=0.052 ms)
aten::gather_kernel_2:MPS(buf#651:2):Int[13, 13]:MPS(buf#655:2):Int[13, 13] (id=K1, run=10, gpu=10.395 ms, cpu=0.052 ms)
aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=5, gpu=10.395 ms, cpu=0.052 ms)
aten::add_out_mps::i32[625,1,4]:f32[1,3,4]:f32[625,3,4] (id=G102, run=1, gpu=10.395 ms, cpu=0.052 ms)
aten::arange_mps_out:i32[13]:13 (id=G103, run=2, gpu=10.395 ms, cpu=0.052 ms)
aten::mul:i32[13]:i64[Scalar]:i32[13] (id=G104, run=2, gpu=10.395 ms, cpu=0.052 ms)
aten::arange_mps_out:i32[13]:13 (id=G103, run=2, gpu=10.395 ms, cpu=0.052 ms)
aten::mul:i32[13]:i64[Scalar]:i32[13] (id=G104, run=2, gpu=10.395 ms, cpu=0.052 ms)
aten::gather_kernel_2:MPS(buf#651:2):Int[13, 13]:MPS(buf#655:2):Int[13, 13] (id=K1, run=10, gpu=10.395 ms, cpu=0.052 ms)
aten::gather_kernel_2:MPS(buf#651:2):Int[13, 13]:MPS(buf#655:2):Int[13, 13] (id=K1, run=10, gpu=10.395 ms, cpu=0.052 ms)
aten::cat_out_mps:1:NCHW:i32:4 (id=G92, run=5, gpu=10.395 ms, cpu=0.052 ms)
aten::add_out_mps::i32[169,1,4]:f32[1,3,4]:f32[169,3,4] (id=G105, run=1, gpu=10.395 ms, cpu=0.052 ms)
aten::cat_out_mps:0:NCHW:f32:5 (id=G106, run=1, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_5:MPS(buf#627:1):Float[1, 13, 13, 3, 4]:MPS(buf#665:1):Float[1, 13, 13, 3, 4] (id=K2, run=10, gpu=14.793 ms, cpu=0.037 ms)
aten::cat_out_mps:1:NCHW:f32:5 (id=G107, run=2, gpu=14.793 ms, cpu=0.037 ms)
aten::cat_out_mps:1:NCHW:f32:5 (id=G107, run=2, gpu=14.793 ms, cpu=0.037 ms)
aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=1, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=14.793 ms, cpu=0.037 ms)
aten::sub_out_mps::f32[159882]:f32[159882]:f32[159882] (id=G109, run=2, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=14.793 ms, cpu=0.037 ms)
aten::sub_out_mps::f32[159882]:f32[159882]:f32[159882] (id=G109, run=2, gpu=14.793 ms, cpu=0.037 ms)
aten::mul:f32[159882]:f32[Scalar]:f32[159882] (id=G110, run=2, gpu=14.793 ms, cpu=0.037 ms)
aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=14.793 ms, cpu=0.037 ms)
aten::add_out_mps::f32[159882]:f32[159882]:f32[159882] (id=G111, run=2, gpu=14.793 ms, cpu=0.037 ms)
aten::mul:f32[159882]:f32[Scalar]:f32[159882] (id=G110, run=2, gpu=1.011 ms, cpu=0.242 ms)
aten::gather_kernel_1:MPS(buf#584:2):Float[159882]:MPS(buf#672:1):Float[159882] (id=K3, run=6, gpu=1.011 ms, cpu=0.242 ms)
aten::add_out_mps::f32[159882]:f32[159882]:f32[159882] (id=G111, run=2, gpu=1.011 ms, cpu=0.242 ms)
aten::gather_kernel_2:MPS(buf#667:3):Float[159882, 1]:MPS(buf#675:1):Float[159882, 1] (id=K4, run=4, gpu=1.011 ms, cpu=0.242 ms)
aten::div_out_mps::f32[159882,1]:f32[Scalar]:f32[159882,1] (id=G112, run=4, gpu=1.011 ms, cpu=0.242 ms)
aten::gather_kernel_2:MPS(buf#667:3):Float[159882, 1]:MPS(buf#675:1):Float[159882, 1] (id=K4, run=4, gpu=1.011 ms, cpu=0.242 ms)
aten::div_out_mps::f32[159882,1]:f32[Scalar]:f32[159882,1] (id=G112, run=4, gpu=1.011 ms, cpu=0.242 ms)
aten::gather_kernel_2:MPS(buf#667:3):Float[159882, 1]:MPS(buf#675:1):Float[159882, 1] (id=K4, run=4, gpu=1.011 ms, cpu=0.242 ms)
aten::div_out_mps::f32[159882,1]:f32[Scalar]:f32[159882,1] (id=G112, run=4, gpu=1.011 ms, cpu=0.242 ms)
aten::gather_kernel_2:MPS(buf#667:3):Float[159882, 1]:MPS(buf#675:1):Float[159882, 1] (id=K4, run=4, gpu=1.011 ms, cpu=0.242 ms)
aten::div_out_mps::f32[159882,1]:f32[Scalar]:f32[159882,1] (id=G112, run=4, gpu=1.011 ms, cpu=0.242 ms)
aten::clamp_out_mps_max:4.135167_scalar::f32[159882,1] (id=G113, run=2, gpu=1.011 ms, cpu=0.242 ms)
aten::clamp_out_mps_max:4.135167_scalar::f32[159882,1] (id=G113, run=2, gpu=9.022 ms, cpu=0.032 ms)
aten::mul:f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G114, run=4, gpu=9.022 ms, cpu=0.032 ms)
aten::add_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G115, run=2, gpu=9.022 ms, cpu=0.032 ms)
aten::mul:f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G114, run=4, gpu=9.022 ms, cpu=0.032 ms)
aten::add_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G115, run=2, gpu=9.022 ms, cpu=0.032 ms)
aten::exp:MPS(buf#673:2):Float[159882, 1] (id=K5, run=2, gpu=9.022 ms, cpu=0.032 ms)
aten::mul:f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G114, run=4, gpu=9.022 ms, cpu=0.032 ms)
aten::exp:MPS(buf#673:2):Float[159882, 1] (id=K5, run=2, gpu=9.022 ms, cpu=0.032 ms)
aten::mul:f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G114, run=4, gpu=9.022 ms, cpu=0.032 ms)
BlitCopySync: CPU:Float[] --> MPS(buf#644:1):Float[] (len=4 bytes, gpu=9.022 ms, cpu=0.032 ms)
aten::mul:f32[Scalar]:f32[159882,1]:f32[159882,1] (id=G116, run=1, gpu=9.417 ms, cpu=0.030 ms)
BlitCopySync: CPU:Float[] --> MPS(buf#644:2):Float[] (len=4 bytes, gpu=9.417 ms, cpu=0.030 ms)
aten::mul:f32[Scalar]:f32[159882,1]:f32[159882,1] (id=G116, run=2, gpu=7.972 ms, cpu=0.335 ms)
aten::sub_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G117, run=2, gpu=7.972 ms, cpu=0.335 ms)
aten::sub_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G117, run=2, gpu=7.972 ms, cpu=0.335 ms)
aten::add_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G115, run=4, gpu=7.972 ms, cpu=0.335 ms)
aten::add_out_mps::f32[159882,1]:f32[159882,1]:f32[159882,1] (id=G115, run=4, gpu=7.972 ms, cpu=0.335 ms)
aten::cat_out_mps:2:NCHW:f32:4 (id=G118, run=1, gpu=7.972 ms, cpu=0.335 ms)
aten::fill_scalar_mps_impl:i64[30000]:1.000000 (id=G119, run=1, gpu=24.877 ms, cpu=0.240 ms)
aten::fill_scalar_mps_impl:i64[7500]:2.000000 (id=G120, run=1, gpu=24.877 ms, cpu=0.240 ms)
aten::fill_scalar_mps_impl:i64[1875]:3.000000 (id=G121, run=1, gpu=24.877 ms, cpu=0.240 ms)
aten::fill_scalar_mps_impl:i64[507]:4.000000 (id=G122, run=1, gpu=24.877 ms, cpu=0.240 ms)
aten::cat_out_mps:0:NCHW:i64:5 (id=G123, run=1, gpu=24.877 ms, cpu=0.240 ms)
aten::topk:1,120000:Float32:k1000:dim1:largest1 (id=G124, run=1, gpu=72.076 ms, cpu=0.028 ms)
aten::add_out_mps::i64[1,1000]:i64[Scalar]:i64[1,1000] (id=G125, run=4, gpu=72.076 ms, cpu=0.028 ms)
aten::gather_kernel_2:MPS(buf#655:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=72.076 ms, cpu=0.028 ms)
aten::topk:1,30000:Float32:k1000:dim1:largest1 (id=G126, run=1, gpu=72.076 ms, cpu=0.028 ms)
aten::add_out_mps::i64[1,1000]:i64[Scalar]:i64[1,1000] (id=G125, run=4, gpu=72.076 ms, cpu=0.028 ms)
aten::gather_kernel_2:MPS(buf#655:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=72.076 ms, cpu=0.028 ms)
aten::topk:1,7500:Float32:k1000:dim1:largest1 (id=G127, run=1, gpu=72.076 ms, cpu=0.028 ms)
aten::add_out_mps::i64[1,1000]:i64[Scalar]:i64[1,1000] (id=G125, run=4, gpu=72.076 ms, cpu=0.028 ms)
aten::gather_kernel_2:MPS(buf#655:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=72.076 ms, cpu=0.028 ms)
aten::topk:1,1875:Float32:k1000:dim1:largest1 (id=G128, run=1, gpu=72.076 ms, cpu=0.028 ms)
aten::add_out_mps::i64[1,1000]:i64[Scalar]:i64[1,1000] (id=G125, run=4, gpu=72.076 ms, cpu=0.028 ms)
aten::gather_kernel_2:MPS(buf#655:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=72.076 ms, cpu=0.028 ms)
aten::topk:1,507:Float32:k507:dim1:largest1 (id=G129, run=1, gpu=72.076 ms, cpu=0.028 ms)
aten::add_out_mps::i64[1,507]:i64[Scalar]:i64[1,507] (id=G130, run=1, gpu=72.076 ms, cpu=0.028 ms)
aten::cat_out_mps:1:NCHW:i64:5 (id=G131, run=1, gpu=72.076 ms, cpu=0.028 ms)
aten::arange_mps_out:i64[1]:1 (id=G132, run=1, gpu=72.076 ms, cpu=0.028 ms)
aten::gather_kernel_2:MPS(buf#644:2):Long[1, 4507]:MPS(buf#615:2):Long[1, 4507] (id=K6, run=2, gpu=72.076 ms, cpu=0.028 ms)
aten::index_select_32bit_idx32:MPS(buf#685:1):Float[1, 4507, 4] (id=K7, run=2, gpu=72.076 ms, cpu=0.028 ms)
aten::gather_kernel_2:MPS(buf#644:2):Long[1, 4507]:MPS(buf#615:2):Long[1, 4507] (id=K6, run=2, gpu=72.076 ms, cpu=0.028 ms)
aten::index_select_64bit_idx32:MPS(buf#657:1):Long[1, 4507] (id=K8, run=1, gpu=72.076 ms, cpu=0.028 ms)
aten::gather_kernel_3:MPS(buf#644:2):Long[1, 4507, 1]:MPS(buf#615:2):Long[1, 4507, 1] (id=K9, run=1, gpu=72.076 ms, cpu=0.028 ms)
aten::index_select_32bit_idx32:MPS(buf#685:1):Float[1, 4507, 4] (id=K7, run=2, gpu=72.076 ms, cpu=0.028 ms)
aten::sigmoid_out_mps:f32[1,4507]:f32[1,4507] (id=G133, run=1, gpu=72.076 ms, cpu=0.028 ms)
aten::gather_kernel_2:MPS(buf#655:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=72.076 ms, cpu=0.028 ms)
aten::clamp_out_mps_min:0.000000_max:800.000000_scalar::f32[4507,2] (id=G134, run=2, gpu=72.076 ms, cpu=0.028 ms)
aten::gather_kernel_2:MPS(buf#655:2):Float[4507, 2]:MPS(buf#659:1):Float[4507, 2] (id=K4, run=10, gpu=72.076 ms, cpu=0.028 ms)
aten::clamp_out_mps_min:0.000000_max:800.000000_scalar::f32[4507,2] (id=G134, run=2, gpu=72.076 ms, cpu=0.028 ms)
aten::cat_out_mps:2:NCHW:f32:2 (id=G135, run=1, gpu=72.076 ms, cpu=0.028 ms)
aten::gather_kernel_1:MPS(buf#659:2):Float[4507]:MPS(buf#616:1):Float[4507] (id=K3, run=10, gpu=72.076 ms, cpu=0.028 ms)
aten::gather_kernel_1:MPS(buf#659:2):Float[4507]:MPS(buf#616:1):Float[4507] (id=K3, run=10, gpu=72.076 ms, cpu=0.028 ms)
aten::sub_out_mps::f32[4507]:f32[4507]:f32[4507] (id=G136, run=2, gpu=72.076 ms, cpu=0.028 ms)
aten::gather_kernel_1:MPS(buf#659:2):Float[4507]:MPS(buf#616:1):Float[4507] (id=K3, run=10, gpu=72.076 ms, cpu=0.028 ms)
aten::gather_kernel_1:MPS(buf#659:2):Float[4507]:MPS(buf#616:1):Float[4507] (id=K3, run=10, gpu=72.076 ms, cpu=0.028 ms)
aten::sub_out_mps::f32[4507]:f32[4507]:f32[4507] (id=G136, run=2, gpu=72.076 ms, cpu=0.028 ms)
aten::greaterThanOrEqualTo:f32[4507]:f32[Scalar]:b8[4507] (id=G137, run=2, gpu=72.076 ms, cpu=0.028 ms)
aten::greaterThanOrEqualTo:f32[4507]:f32[Scalar]:b8[4507] (id=G137, run=2, gpu=0.003 ms, cpu=0.013 ms)
aten::bitwise_and_tensor:MPS(buf#622:2):Bool[4507]:MPS(buf#627:2):Bool[4507] (id=K10, run=1, gpu=0.003 ms, cpu=0.013 ms)
aten::count_nonzero_mps:0::b8[4507]:0:7::i64[Scalar]:Bool (id=G138, run=1, gpu=4.918 ms, cpu=0.043 ms)
BlitCopySync: MPS(buf#642:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=4.918 ms, cpu=0.043 ms)
aten::nonzero_out_native_mps:b8[4507] (id=G139, run=1, gpu=0.439 ms, cpu=0.084 ms)
aten::index_select_32bit_idx32:MPS(buf#623:1):Float[4507] (id=K7, run=4, gpu=0.439 ms, cpu=0.084 ms)
aten::index_select_32bit_idx32:MPS(buf#623:1):Float[4507] (id=K7, run=4, gpu=0.439 ms, cpu=0.084 ms)
aten::index_select_64bit_idx32:MPS(buf#619:1):Long[4507] (id=K8, run=2, gpu=0.439 ms, cpu=0.084 ms)
aten::greaterThanOrEqualTo:f32[4507]:f32[Scalar]:b8[4507] (id=G137, run=3, gpu=0.439 ms, cpu=0.084 ms)
aten::count_nonzero_mps:0::b8[4507]:0:7::i64[Scalar]:Bool (id=G138, run=2, gpu=0.629 ms, cpu=0.034 ms)
BlitCopySync: MPS(buf#642:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.629 ms, cpu=0.034 ms)
aten::nonzero_out_native_mps:b8[4507] (id=G139, run=2, gpu=1.279 ms, cpu=0.041 ms)
aten::index_select_32bit_idx32:MPS(buf#654:1):Float[4507] (id=K7, run=6, gpu=1.279 ms, cpu=0.041 ms)
aten::index_select_32bit_idx32:MPS(buf#654:1):Float[4507] (id=K7, run=6, gpu=1.279 ms, cpu=0.041 ms)
aten::index_select_64bit_idx32:MPS(buf#616:1):Long[4507] (id=K8, run=3, gpu=1.279 ms, cpu=0.041 ms)
aten::max_mps:f32[4507,4] (id=G140, run=1, gpu=1.279 ms, cpu=0.041 ms)
aten::copy_cast_mps:i64[[-1]]:f32[[-1]]:0 (id=G141, run=1, gpu=1.279 ms, cpu=0.041 ms)
BlitCopySync: CPU:Float[] --> MPS(buf#628:1):Float[] (len=4 bytes, gpu=1.279 ms, cpu=0.041 ms)
aten::add_out_mps::f32[Scalar]:f32[Scalar]:f32[Scalar] (id=G142, run=1, gpu=18.014 ms, cpu=0.163 ms)
aten::mul:f32[4507]:f32[Scalar]:f32[4507] (id=G143, run=1, gpu=18.014 ms, cpu=0.163 ms)
aten::add_out_mps::f32[4507,4]:f32[4507,1]:f32[4507,4] (id=G144, run=1, gpu=18.014 ms, cpu=0.163 ms)
BlitCopy: MPS(buf#663:1):Float[4507] --> MPS(buf#616:1):Float[4507] (len=17.61 KB, gpu=18.014 ms, cpu=0.163 ms)
aten::sort:4507:Float32:dim0:descending1 (id=G145, run=1, gpu=18.014 ms, cpu=0.163 ms)
aten::index_select_out_mps:f32[4507,4]:i64[4507]:0 (id=G146, run=1, gpu=18.014 ms, cpu=0.163 ms)
aten::nms_float:MPS(buf#615:2):Float[4507, 4]:MPS(buf#663:2):Float[4507] (id=K11, run=1, gpu=18.014 ms, cpu=0.163 ms)
BlitCopySync: MPS(buf#584:2):Long[319997] --> CPU:Long[319997] (len=2.44 MB, gpu=18.014 ms, cpu=0.163 ms)
BlitCopySync: CPU:Long[2004] --> MPS(buf#668:1):Long[2004] (len=15.66 KB, gpu=2.291 ms, cpu=0.040 ms)
aten::index_select_64bit_idx32:MPS(buf#670:1):Long[2004] (id=K8, run=4, gpu=51.431 ms, cpu=42.563 ms)
aten::index_select_32bit_idx32:MPS(buf#663:1):Float[1000] (id=K7, run=8, gpu=51.431 ms, cpu=42.563 ms)
aten::index_select_32bit_idx32:MPS(buf#663:1):Float[1000] (id=K7, run=8, gpu=51.431 ms, cpu=42.563 ms)
aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=4, gpu=51.431 ms, cpu=42.563 ms)
aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=4, gpu=51.431 ms, cpu=42.563 ms)
aten::cat_out_mps:1:NCHW:f32:2 (id=G147, run=1, gpu=51.431 ms, cpu=42.563 ms)
aten::gather_kernel_1:MPS(buf#615:2):Float[1000]:MPS(buf#661:1):Float[1000] (id=K3, run=14, gpu=51.431 ms, cpu=42.563 ms)
aten::gather_kernel_1:MPS(buf#615:2):Float[1000]:MPS(buf#661:1):Float[1000] (id=K3, run=14, gpu=51.431 ms, cpu=42.563 ms)
aten::sub_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G148, run=2, gpu=51.431 ms, cpu=42.563 ms)
aten::gather_kernel_1:MPS(buf#615:2):Float[1000]:MPS(buf#661:1):Float[1000] (id=K3, run=14, gpu=51.431 ms, cpu=42.563 ms)
aten::gather_kernel_1:MPS(buf#615:2):Float[1000]:MPS(buf#661:1):Float[1000] (id=K3, run=14, gpu=51.431 ms, cpu=42.563 ms)
aten::sub_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G148, run=2, gpu=51.431 ms, cpu=42.563 ms)
aten::mul:f32[1000]:f32[1000]:f32[1000] (id=G149, run=1, gpu=51.431 ms, cpu=42.563 ms)
aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=4, gpu=51.431 ms, cpu=42.563 ms)
aten::sqrt_out_mps:f32[1000]:f32[1000] (id=G150, run=1, gpu=51.431 ms, cpu=42.563 ms)
aten::div_out_mps::f32[1000]:i64[Scalar]:f32[1000] (id=G151, run=1, gpu=51.431 ms, cpu=42.563 ms)
aten::log2_out_mps:f32[1000]:f32[1000] (id=G152, run=1, gpu=51.431 ms, cpu=42.563 ms)
aten::add_out_mps::f32[1000]:i64[Scalar]:f32[1000] (id=G153, run=1, gpu=51.431 ms, cpu=42.563 ms)
aten::add_out_mps::f32[1000]:f32[Scalar]:f32[1000] (id=G154, run=1, gpu=51.431 ms, cpu=42.563 ms)
aten::floor_out_mps:f32[1000]:f32[1000] (id=G155, run=1, gpu=51.431 ms, cpu=42.563 ms)
aten::clamp_out_mps_min:2.000000_max:5.000000_scalar::f32[1000] (id=G156, run=1, gpu=51.431 ms, cpu=42.563 ms)
aten::copy_cast_mps:f32[[-1]]:i64[[-1]]:0 (id=G157, run=1, gpu=51.431 ms, cpu=42.563 ms)
aten::sub_out_mps::i64[1000]:i64[Scalar]:i64[1000] (id=G158, run=1, gpu=51.431 ms, cpu=42.563 ms)
aten::equal:i64[1000]:i64[Scalar]:b8[1000] (id=G159, run=1, gpu=51.431 ms, cpu=42.563 ms)
aten::count_nonzero_mps:0::b8[1000]:0:7::i64[Scalar]:Bool (id=G160, run=1, gpu=0.312 ms, cpu=0.044 ms)
BlitCopySync: MPS(buf#644:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.312 ms, cpu=0.044 ms)
Error: command buffer exited with error status.
The Metal Performance Shaders operations encoded on it may not have completed.
Error:
(null)
Internal Error (0000000e:Internal Error)
<AGXG15XFamilyCommandBuffer: 0x3b3ce3dc0>
label = <none>
device = <AGXG15CDevice: 0x133ed4000>
name = Apple M3 Max
commandQueue = <AGXG15XFamilyCommandQueue: 0x105063600>
label = <none>
device = <AGXG15CDevice: 0x133ed4000>
name = Apple M3 Max
retainedReferences = 1
aten::nonzero_out_native_mps:b8[1000] (id=G161, run=1, gpu=26795.452 ms, cpu=0.052 ms)
aten::index_select_32bit_idx32:MPS(buf#623:1):Float[959, 5] (id=K7, run=9, gpu=26795.452 ms, cpu=0.052 ms)
aten::roi_align_float:MPS(buf#610:1):Float[1, 256, 200, 200]:MPS(buf#660:2):Float[959, 5] (id=K12, run=1, gpu=26795.452 ms, cpu=0.052 ms)
aten::index_put_32bit_idx32:MPS(buf#688:2):Float[959, 256, 7, 7] (id=K13, run=1, gpu=26795.452 ms, cpu=0.052 ms)
aten::equal:i64[1000]:i64[Scalar]:b8[1000] (id=G159, run=2, gpu=26795.452 ms, cpu=0.052 ms)
aten::count_nonzero_mps:0::b8[1000]:0:7::i64[Scalar]:Bool (id=G160, run=2, gpu=5.924 ms, cpu=0.217 ms)
BlitCopySync: MPS(buf#644:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=5.924 ms, cpu=0.217 ms)
aten::nonzero_out_native_mps:b8[1000] (id=G161, run=2, gpu=34729.081 ms, cpu=0.040 ms)
aten::index_select_32bit_idx32:MPS(buf#623:1):Float[959, 5] (id=K7, run=10, gpu=34729.081 ms, cpu=0.040 ms)
aten::roi_align_float:MPS(buf#600:1):Float[1, 256, 100, 100]:MPS(buf#619:2):Float[959, 5] (id=K12, run=2, gpu=34729.081 ms, cpu=0.040 ms)
aten::index_put_32bit_idx32:MPS(buf#690:2):Float[959, 256, 7, 7] (id=K13, run=2, gpu=34729.081 ms, cpu=0.040 ms)
aten::equal:i64[1000]:i64[Scalar]:b8[1000] (id=G159, run=3, gpu=34729.081 ms, cpu=0.040 ms)
aten::count_nonzero_mps:0::b8[1000]:0:7::i64[Scalar]:Bool (id=G160, run=3, gpu=0.038 ms, cpu=0.264 ms)
BlitCopySync: MPS(buf#644:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.038 ms, cpu=0.264 ms)
aten::nonzero_out_native_mps:b8[1000] (id=G161, run=3, gpu=1.195 ms, cpu=0.265 ms)
aten::index_select_32bit_idx32:MPS(buf#623:1):Float[3, 5] (id=K7, run=11, gpu=1.195 ms, cpu=0.265 ms)
aten::roi_align_float:MPS(buf#590:1):Float[1, 256, 50, 50]:MPS(buf#652:2):Float[3, 5] (id=K12, run=3, gpu=1.195 ms, cpu=0.265 ms)
aten::index_put_32bit_idx32:MPS(buf#655:2):Float[3, 256, 7, 7] (id=K13, run=3, gpu=1.195 ms, cpu=0.265 ms)
aten::equal:i64[1000]:i64[Scalar]:b8[1000] (id=G159, run=4, gpu=1.195 ms, cpu=0.265 ms)
aten::count_nonzero_mps:0::b8[1000]:0:7::i64[Scalar]:Bool (id=G160, run=4, gpu=0.031 ms, cpu=0.032 ms)
BlitCopySync: MPS(buf#644:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.031 ms, cpu=0.032 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1000,256,7,7]:f32[256,256,3,3]:0:nobias (id=G162, run=4, gpu=5.316 ms, cpu=0.796 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1000,256,7,7::f32[1000,256,7,7]:f32[256]:f32[256]:f32[256]:f32[256] (id=G163, run=4, gpu=5.316 ms, cpu=0.796 ms)
aten::relu_:f32[1000,256,7,7] (id=G164, run=4, gpu=5.456 ms, cpu=0.025 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1000,256,7,7]:f32[256,256,3,3]:0:nobias (id=G162, run=4, gpu=5.456 ms, cpu=0.025 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1000,256,7,7::f32[1000,256,7,7]:f32[256]:f32[256]:f32[256]:f32[256] (id=G163, run=4, gpu=5.456 ms, cpu=0.025 ms)
aten::relu_:f32[1000,256,7,7] (id=G164, run=4, gpu=5.321 ms, cpu=0.678 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1000,256,7,7]:f32[256,256,3,3]:0:nobias (id=G162, run=4, gpu=5.321 ms, cpu=0.678 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1000,256,7,7::f32[1000,256,7,7]:f32[256]:f32[256]:f32[256]:f32[256] (id=G163, run=4, gpu=5.321 ms, cpu=0.678 ms)
aten::relu_:f32[1000,256,7,7] (id=G164, run=4, gpu=5.320 ms, cpu=0.020 ms)
aten::mps_convolution:1:1:1:1:1:1:1:Contiguous:f32[1000,256,7,7]:f32[256,256,3,3]:0:nobias (id=G162, run=4, gpu=5.320 ms, cpu=0.020 ms)
aten::batch_norm_mps_out:Contiguous:0.000010:0.100000:0:1:1:1:1000,256,7,7::f32[1000,256,7,7]:f32[256]:f32[256]:f32[256]:f32[256] (id=G163, run=4, gpu=5.320 ms, cpu=0.020 ms)
aten::relu_:f32[1000,256,7,7] (id=G164, run=4, gpu=2.929 ms, cpu=0.830 ms)
aten::mps_linear:f32[1000,12544]:f32[1024,12544]:f32[1024] (id=G165, run=1, gpu=2.929 ms, cpu=0.830 ms)
aten::relu_:f32[1000,1024] (id=G166, run=1, gpu=2.929 ms, cpu=0.830 ms)
aten::mps_linear:f32[1000,1024]:f32[91,1024]:f32[91] (id=G167, run=1, gpu=2.929 ms, cpu=0.830 ms)
aten::mps_linear:f32[1000,1024]:f32[364,1024]:f32[364] (id=G168, run=1, gpu=2.929 ms, cpu=0.830 ms)
aten::cat_out_mps:0:NCHW:f32:1 (id=G108, run=5, gpu=2.929 ms, cpu=0.830 ms)
aten::gather_kernel_1:MPS(buf#655:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=2.929 ms, cpu=0.830 ms)
aten::gather_kernel_1:MPS(buf#655:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=2.929 ms, cpu=0.830 ms)
aten::sub_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G148, run=4, gpu=2.929 ms, cpu=0.830 ms)
aten::gather_kernel_1:MPS(buf#655:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=2.929 ms, cpu=0.830 ms)
aten::gather_kernel_1:MPS(buf#655:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=2.929 ms, cpu=0.830 ms)
aten::sub_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G148, run=4, gpu=2.929 ms, cpu=0.830 ms)
aten::mul:f32[1000]:f32[Scalar]:f32[1000] (id=G169, run=2, gpu=0.038 ms, cpu=0.037 ms)
aten::gather_kernel_1:MPS(buf#655:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=0.038 ms, cpu=0.037 ms)
aten::add_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G170, run=2, gpu=0.038 ms, cpu=0.037 ms)
aten::mul:f32[1000]:f32[Scalar]:f32[1000] (id=G169, run=2, gpu=0.038 ms, cpu=0.037 ms)
aten::gather_kernel_1:MPS(buf#655:3):Float[1000]:MPS(buf#663:1):Float[1000] (id=K3, run=20, gpu=0.038 ms, cpu=0.037 ms)
aten::add_out_mps::f32[1000]:f32[1000]:f32[1000] (id=G170, run=2, gpu=0.038 ms, cpu=0.037 ms)
aten::gather_kernel_2:MPS(buf#657:3):Float[1000, 91]:MPS(buf#669:1):Float[1000, 91] (id=K4, run=14, gpu=0.038 ms, cpu=0.037 ms)
aten::div_out_mps::f32[1000,91]:f32[Scalar]:f32[1000,91] (id=G171, run=4, gpu=0.038 ms, cpu=0.037 ms)
aten::gather_kernel_2:MPS(buf#657:3):Float[1000, 91]:MPS(buf#669:1):Float[1000, 91] (id=K4, run=14, gpu=0.038 ms, cpu=0.037 ms)
aten::div_out_mps::f32[1000,91]:f32[Scalar]:f32[1000,91] (id=G171, run=4, gpu=0.038 ms, cpu=0.037 ms)
aten::gather_kernel_2:MPS(buf#657:3):Float[1000, 91]:MPS(buf#669:1):Float[1000, 91] (id=K4, run=14, gpu=0.038 ms, cpu=0.037 ms)
aten::div_out_mps::f32[1000,91]:f32[Scalar]:f32[1000,91] (id=G171, run=4, gpu=0.038 ms, cpu=0.037 ms)
aten::gather_kernel_2:MPS(buf#657:3):Float[1000, 91]:MPS(buf#669:1):Float[1000, 91] (id=K4, run=14, gpu=0.038 ms, cpu=0.037 ms)
aten::div_out_mps::f32[1000,91]:f32[Scalar]:f32[1000,91] (id=G171, run=4, gpu=0.021 ms, cpu=0.036 ms)
aten::clamp_out_mps_max:4.135167_scalar::f32[1000,91] (id=G172, run=2, gpu=0.021 ms, cpu=0.036 ms)
aten::clamp_out_mps_max:4.135167_scalar::f32[1000,91] (id=G172, run=2, gpu=0.021 ms, cpu=0.036 ms)
aten::mul:f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G173, run=4, gpu=0.021 ms, cpu=0.036 ms)
aten::add_out_mps::f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G174, run=2, gpu=0.021 ms, cpu=0.036 ms)
aten::mul:f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G173, run=4, gpu=0.021 ms, cpu=0.036 ms)
aten::add_out_mps::f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G174, run=2, gpu=0.021 ms, cpu=0.036 ms)
aten::exp:MPS(buf#666:2):Float[1000, 91] (id=K5, run=4, gpu=0.021 ms, cpu=0.036 ms)
aten::mul:f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G173, run=4, gpu=0.030 ms, cpu=0.036 ms)
aten::exp:MPS(buf#666:2):Float[1000, 91] (id=K5, run=4, gpu=0.030 ms, cpu=0.036 ms)
aten::mul:f32[1000,91]:f32[1000,1]:f32[1000,91] (id=G173, run=4, gpu=0.030 ms, cpu=0.036 ms)
BlitCopySync: CPU:Float[] --> MPS(buf#644:1):Float[] (len=4 bytes, gpu=0.030 ms, cpu=0.036 ms)
aten::mul:f32[Scalar]:f32[1000,91]:f32[1000,91] (id=G175, run=1, gpu=0.013 ms, cpu=0.097 ms)
BlitCopySync: CPU:Float[] --> MPS(buf#642:1):Float[] (len=4 bytes, gpu=0.013 ms, cpu=0.097 ms)
aten::mul:f32[Scalar]:f32[1000,91]:f32[1000,91] (id=G175, run=2, gpu=0.044 ms, cpu=0.026 ms)
aten::sub_out_mps::f32[1000,91]:f32[1000,91]:f32[1000,91] (id=G176, run=2, gpu=0.044 ms, cpu=0.026 ms)
aten::sub_out_mps::f32[1000,91]:f32[1000,91]:f32[1000,91] (id=G176, run=2, gpu=0.044 ms, cpu=0.026 ms)
aten::add_out_mps::f32[1000,91]:f32[1000,91]:f32[1000,91] (id=G177, run=2, gpu=0.044 ms, cpu=0.026 ms)
aten::add_out_mps::f32[1000,91]:f32[1000,91]:f32[1000,91] (id=G177, run=2, gpu=0.044 ms, cpu=0.026 ms)
aten::cat_out_mps:2:NCHW:f32:4 (id=G118, run=2, gpu=0.044 ms, cpu=0.026 ms)
aten::softmax_mps_out:f32[[-1]]:Contiguous:1 (id=G178, run=1, gpu=0.102 ms, cpu=0.305 ms)
aten::gather_kernel_3:MPS(buf#667:2):Float[1000, 90, 4]:MPS(buf#685:1):Float[1000, 90, 4] (id=K14, run=3, gpu=0.102 ms, cpu=0.305 ms)
aten::clamp_out_mps_min:0.000000_max:800.000000_scalar::f32[1000,91,2] (id=G179, run=2, gpu=0.102 ms, cpu=0.305 ms)
aten::gather_kernel_3:MPS(buf#667:2):Float[1000, 90, 4]:MPS(buf#685:1):Float[1000, 90, 4] (id=K14, run=3, gpu=0.102 ms, cpu=0.305 ms)
aten::clamp_out_mps_min:0.000000_max:800.000000_scalar::f32[1000,91,2] (id=G179, run=2, gpu=0.102 ms, cpu=0.305 ms)
aten::cat_out_mps:3:NCHW:f32:2 (id=G180, run=1, gpu=0.102 ms, cpu=0.305 ms)
aten::arange_mps_out:i64[91]:91 (id=G181, run=1, gpu=0.102 ms, cpu=0.305 ms)
aten::gather_kernel_3:MPS(buf#667:2):Float[1000, 90, 4]:MPS(buf#685:1):Float[1000, 90, 4] (id=K14, run=3, gpu=0.102 ms, cpu=0.305 ms)
aten::gather_kernel_2:MPS(buf#616:2):Float[1000, 90]:MPS(buf#663:1):Float[1000, 90] (id=K4, run=15, gpu=0.102 ms, cpu=0.305 ms)
aten::gather_kernel_2:MPS(buf#655:2):Long[1000, 90]:MPS(buf#699:2):Long[1000, 90] (id=K6, run=3, gpu=0.102 ms, cpu=0.305 ms)
aten::greaterThan:f32[90000]:f32[Scalar]:b8[90000] (id=G182, run=1, gpu=0.102 ms, cpu=0.305 ms)
aten::count_nonzero_mps:0::b8[90000]:0:7::i64[Scalar]:Bool (id=G183, run=1, gpu=0.033 ms, cpu=0.038 ms)
BlitCopySync: MPS(buf#628:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.033 ms, cpu=0.038 ms)
aten::nonzero_out_native_mps:b8[90000] (id=G184, run=1, gpu=0.043 ms, cpu=0.047 ms)
aten::index_select_32bit_idx32:MPS(buf#663:1):Float[262] (id=K7, run=13, gpu=0.043 ms, cpu=0.047 ms)
aten::index_select_32bit_idx32:MPS(buf#663:1):Float[262] (id=K7, run=13, gpu=0.043 ms, cpu=0.047 ms)
aten::index_select_64bit_idx32:MPS(buf#699:1):Long[262] (id=K8, run=5, gpu=0.043 ms, cpu=0.047 ms)
aten::gather_kernel_1:MPS(buf#655:3):Float[262]:MPS(buf#675:1):Float[262] (id=K3, run=24, gpu=0.043 ms, cpu=0.047 ms)
aten::gather_kernel_1:MPS(buf#655:3):Float[262]:MPS(buf#675:1):Float[262] (id=K3, run=24, gpu=0.043 ms, cpu=0.047 ms)
aten::sub_out_mps::f32[262]:f32[262]:f32[262] (id=G185, run=2, gpu=0.016 ms, cpu=0.015 ms)
aten::gather_kernel_1:MPS(buf#655:3):Float[262]:MPS(buf#675:1):Float[262] (id=K3, run=24, gpu=0.016 ms, cpu=0.015 ms)
aten::gather_kernel_1:MPS(buf#655:3):Float[262]:MPS(buf#675:1):Float[262] (id=K3, run=24, gpu=0.016 ms, cpu=0.015 ms)
aten::sub_out_mps::f32[262]:f32[262]:f32[262] (id=G185, run=2, gpu=0.016 ms, cpu=0.015 ms)
aten::greaterThanOrEqualTo:f32[262]:f32[Scalar]:b8[262] (id=G186, run=2, gpu=0.016 ms, cpu=0.015 ms)
aten::greaterThanOrEqualTo:f32[262]:f32[Scalar]:b8[262] (id=G186, run=2, gpu=0.016 ms, cpu=0.015 ms)
aten::bitwise_and_tensor:MPS(buf#666:2):Bool[262]:MPS(buf#675:2):Bool[262] (id=K10, run=2, gpu=0.016 ms, cpu=0.015 ms)
aten::count_nonzero_mps:0::b8[262]:0:7::i64[Scalar]:Bool (id=G187, run=1, gpu=0.012 ms, cpu=0.025 ms)
BlitCopySync: MPS(buf#628:2):Long[] --> CPU:Long[] (len=8 bytes, gpu=0.012 ms, cpu=0.025 ms)
aten::nonzero_out_native_mps:b8[262] (id=G188, run=1, gpu=0.062 ms, cpu=0.040 ms)
aten::index_select_32bit_idx32:MPS(buf#625:1):Float[262] (id=K7, run=15, gpu=0.062 ms, cpu=0.040 ms)
aten::index_select_32bit_idx32:MPS(buf#625:1):Float[262] (id=K7, run=15, gpu=0.062 ms, cpu=0.040 ms)
aten::index_select_64bit_idx32:MPS(buf#674:1):Long[262] (id=K8, run=6, gpu=0.062 ms, cpu=0.040 ms)
aten::max_mps:f32[262,4] (id=G189, run=1, gpu=0.062 ms, cpu=0.040 ms)
aten::copy_cast_mps:i64[[-1]]:f32[[-1]]:0 (id=G141, run=2, gpu=0.062 ms, cpu=0.040 ms)
BlitCopySync: CPU:Float[] --> MPS(buf#636:1):Float[] (len=4 bytes, gpu=0.062 ms, cpu=0.040 ms)
aten::add_out_mps::f32[Scalar]:f32[Scalar]:f32[Scalar] (id=G142, run=2, gpu=0.518 ms, cpu=0.049 ms)
aten::mul:f32[262]:f32[Scalar]:f32[262] (id=G190, run=1, gpu=0.518 ms, cpu=0.049 ms)
aten::add_out_mps::f32[262,4]:f32[262,1]:f32[262,4] (id=G191, run=1, gpu=0.518 ms, cpu=0.049 ms)
BlitCopy: MPS(buf#669:1):Float[262] --> MPS(buf#679:1):Float[262] (len=1.02 KB, gpu=0.518 ms, cpu=0.049 ms)
aten::sort:262:Float32:dim0:descending1 (id=G192, run=1, gpu=0.518 ms, cpu=0.049 ms)
aten::index_select_out_mps:f32[262,4]:i64[262]:0 (id=G193, run=1, gpu=0.518 ms, cpu=0.049 ms)
aten::nms_float:MPS(buf#674:2):Float[262, 4]:MPS(buf#669:2):Float[262] (id=K11, run=2, gpu=0.518 ms, cpu=0.049 ms)
BlitCopySync: MPS(buf#676:2):Long[1310] --> CPU:Long[1310] (len=10.23 KB, gpu=0.518 ms, cpu=0.049 ms)
BlitCopySync: CPU:Long[158] --> MPS(buf#671:1):Long[158] (len=1.23 KB, gpu=0.006 ms, cpu=0.024 ms)
aten::index_select_64bit_idx32:MPS(buf#675:1):Long[100] (id=K8, run=8, gpu=0.032 ms, cpu=0.039 ms)
aten::index_select_32bit_idx32:MPS(buf#669:1):Float[100] (id=K7, run=17, gpu=0.032 ms, cpu=0.039 ms)
aten::index_select_32bit_idx32:MPS(buf#669:1):Float[100] (id=K7, run=17, gpu=0.032 ms, cpu=0.039 ms)
aten::index_select_64bit_idx32:MPS(buf#675:1):Long[100] (id=K8, run=8, gpu=0.032 ms, cpu=0.039 ms)
BlitCopySync: CPU:Float[] --> MPS(buf#628:1):Float[] (len=4 bytes, gpu=0.032 ms, cpu=0.039 ms)
BlitCopySync: CPU:Float[] --> MPS(buf#636:1):Float[] (len=4 bytes, gpu=0.006 ms, cpu=0.027 ms)
aten::div_out_mps::f32[Scalar]:f32[Scalar]:f32[Scalar] (id=G194, run=1, gpu=0.011 ms, cpu=0.035 ms)
BlitCopySync: CPU:Float[] --> MPS(buf#628:2):Float[] (len=4 bytes, gpu=0.011 ms, cpu=0.035 ms)
BlitCopySync: CPU:Float[] --> MPS(buf#636:1):Float[] (len=4 bytes, gpu=0.003 ms, cpu=0.023 ms)
-------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------------------------------------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls Input Shapes
-------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------------------------------------------
model_inference 0.01% 8.528ms 100.00% 62.768s 62.768s 1 []
aten::where 0.00% 5.208us 98.20% 61.637s 15.409s 4 [[1000]]
aten::nonzero_numpy 0.00% 26.543us 98.20% 61.637s 15.409s 4 [[1000]]
aten::nonzero 98.18% 61.625s 98.20% 61.636s 15.409s 4 [[1000]]
aten::upsample_nearest2d 0.38% 238.733ms 0.38% 238.733ms 238.733ms 1 [[1, 256, 25, 25], [], []]
aten::where 0.00% 7.082us 0.18% 113.257ms 56.628ms 2 [[4507]]
aten::nonzero_numpy 0.00% 23.751us 0.18% 113.249ms 56.625ms 2 [[4507]]
aten::nonzero 0.16% 101.306ms 0.18% 113.183ms 56.591ms 2 [[4507]]
aten::to 0.00% 8.083us 0.11% 69.337ms 13.867ms 5 [[3, 4], [], [], [], [], []]
aten::_to_copy 0.00% 34.708us 0.11% 69.329ms 13.866ms 5 [[3, 4], [], [], [], [], [], []]
-------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------------------------------------------
Self CPU time total: 62.768s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment