Skip to content

Instantly share code, notes, and snippets.

@briansp2020
Last active September 16, 2023 13:39
Show Gist options
  • Save briansp2020/e885f0eb6cbec45fcaf0c2eac8c3ee11 to your computer and use it in GitHub Desktop.
Save briansp2020/e885f0eb6cbec45fcaf0c2eac8c3ee11 to your computer and use it in GitHub Desktop.
ai-benchmark comparison between 7900XTX (ROCm) and 3080ti (WSL2) (9/4/2023)
:~/tmp$ python benchmark.py
>> AI-Benchmark - 0.1.3.cm
>> Let the AI Games begin
* TF Version: 2.13.0
* Platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
* CPU: AMD Ryzen 9 5950X 16-Core Processor
* CPU RAM: 8 GB
* GPU/0: NVIDIA GeForce RTX 3080 Ti
* GPU RAM: 9.1 GB
* CUDA Version: 11.5
* CUDA Build: V11.5.119
The benchmark is running...
The tests might take up to 20 minutes
Please don't interrupt the script
1/19. MobileNet-V2
Inference Time: 1956 ms
Inference Time: 24 ms
Inference Time: 23 ms
Inference Time: 22 ms
Inference Time: 23 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 23 ms
Inference Time: 22 ms
Inference Time: 21 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 32 ms
Inference Time: 32 ms
Inference Time: 33 ms
Inference Time: 33 ms
Inference Time: 35 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 34 ms
Inference Time: 37 ms
Inference Time: 36 ms
1.1 - inference | batch=50, size=224x224: 28.1 ± 6.0 ms
Training Time: 2979 ms
Training Time: 78 ms
Training Time: 70 ms
Training Time: 67 ms
Training Time: 73 ms
Training Time: 72 ms
Training Time: 73 ms
Training Time: 73 ms
Training Time: 72 ms
Training Time: 108 ms
Training Time: 75 ms
Training Time: 75 ms
Training Time: 74 ms
Training Time: 74 ms
Training Time: 74 ms
Training Time: 82 ms
Training Time: 79 ms
Training Time: 81 ms
Training Time: 75 ms
Training Time: 89 ms
Training Time: 106 ms
Training Time: 117 ms
1.2 - training | batch=50, size=224x224: 80.3 ± 13.2 ms
2/19. Inception-V3
Inference Time: 1021 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 37 ms
Inference Time: 37 ms
Inference Time: 37 ms
Inference Time: 37 ms
Inference Time: 37 ms
Inference Time: 37 ms
Inference Time: 38 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 37 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 37 ms
Inference Time: 46 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 48 ms
2.1 - inference | batch=20, size=346x346: 39.1 ± 4.8 ms
Training Time: 5588 ms
Training Time: 95 ms
Training Time: 101 ms
Training Time: 101 ms
Training Time: 99 ms
Training Time: 99 ms
Training Time: 104 ms
Training Time: 94 ms
Training Time: 94 ms
Training Time: 97 ms
Training Time: 100 ms
Training Time: 98 ms
Training Time: 97 ms
Training Time: 98 ms
Training Time: 99 ms
Training Time: 100 ms
Training Time: 99 ms
Training Time: 98 ms
Training Time: 96 ms
Training Time: 99 ms
Training Time: 96 ms
Training Time: 93 ms
2.2 - training | batch=20, size=346x346: 98.0 ± 2.6 ms
3/19. Inception-V4
Inference Time: 1241 ms
Inference Time: 30 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 30 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 30 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 31 ms
Inference Time: 29 ms
Inference Time: 30 ms
Inference Time: 30 ms
Inference Time: 30 ms
Inference Time: 29 ms
3.1 - inference | batch=10, size=346x346: 29.4 ± 0.6 ms
Training Time: 5039 ms
Training Time: 105 ms
Training Time: 101 ms
Training Time: 108 ms
Training Time: 100 ms
Training Time: 121 ms
Training Time: 108 ms
Training Time: 99 ms
Training Time: 99 ms
Training Time: 118 ms
Training Time: 105 ms
Training Time: 107 ms
Training Time: 100 ms
Training Time: 97 ms
Training Time: 99 ms
Training Time: 99 ms
Training Time: 102 ms
Training Time: 99 ms
Training Time: 102 ms
Training Time: 100 ms
Training Time: 99 ms
Training Time: 99 ms
3.2 - training | batch=10, size=346x346: 103 ± 6 ms
4/19. Inception-ResNet-V2
Inference Time: 1099 ms
Inference Time: 40 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 37 ms
Inference Time: 37 ms
Inference Time: 37 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 36 ms
Inference Time: 37 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 40 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 37 ms
Inference Time: 38 ms
Inference Time: 37 ms
Inference Time: 37 ms
Inference Time: 37 ms
Inference Time: 37 ms
4.1 - inference | batch=10, size=346x346: 37.7 ± 0.9 ms
Training Time: 5609 ms
Training Time: 112 ms
Training Time: 127 ms
Training Time: 173 ms
Training Time: 112 ms
Training Time: 172 ms
Training Time: 175 ms
Training Time: 177 ms
Training Time: 156 ms
Training Time: 111 ms
Training Time: 113 ms
Training Time: 155 ms
Training Time: 176 ms
Training Time: 171 ms
Training Time: 111 ms
Training Time: 112 ms
Training Time: 163 ms
Training Time: 173 ms
Training Time: 112 ms
Training Time: 114 ms
Training Time: 174 ms
Training Time: 112 ms
4.2 - training | batch=8, size=346x346: 143 ± 29 ms
5/19. ResNet-V2-50
Inference Time: 400 ms
Inference Time: 21 ms
Inference Time: 20 ms
Inference Time: 20 ms
Inference Time: 20 ms
Inference Time: 20 ms
Inference Time: 20 ms
Inference Time: 20 ms
Inference Time: 20 ms
Inference Time: 20 ms
Inference Time: 20 ms
Inference Time: 19 ms
Inference Time: 20 ms
Inference Time: 20 ms
Inference Time: 19 ms
Inference Time: 20 ms
Inference Time: 19 ms
Inference Time: 21 ms
Inference Time: 19 ms
Inference Time: 20 ms
Inference Time: 21 ms
Inference Time: 20 ms
5.1 - inference | batch=10, size=346x346: 20.0 ± 0.6 ms
Training Time: 3081 ms
Training Time: 59 ms
Training Time: 59 ms
Training Time: 67 ms
Training Time: 58 ms
Training Time: 59 ms
Training Time: 59 ms
Training Time: 59 ms
Training Time: 60 ms
Training Time: 58 ms
Training Time: 59 ms
Training Time: 58 ms
Training Time: 59 ms
Training Time: 60 ms
Training Time: 59 ms
Training Time: 59 ms
Training Time: 58 ms
Training Time: 59 ms
Training Time: 59 ms
Training Time: 59 ms
Training Time: 59 ms
Training Time: 60 ms
5.2 - training | batch=10, size=346x346: 59.3 ± 1.8 ms
6/19. ResNet-V2-152
Inference Time: 563 ms
Inference Time: 26 ms
Inference Time: 26 ms
Inference Time: 26 ms
Inference Time: 27 ms
Inference Time: 25 ms
Inference Time: 27 ms
Inference Time: 26 ms
Inference Time: 26 ms
Inference Time: 25 ms
Inference Time: 26 ms
Inference Time: 26 ms
Inference Time: 26 ms
Inference Time: 25 ms
Inference Time: 27 ms
Inference Time: 26 ms
Inference Time: 26 ms
Inference Time: 26 ms
Inference Time: 25 ms
Inference Time: 26 ms
Inference Time: 26 ms
Inference Time: 25 ms
6.1 - inference | batch=10, size=256x256: 25.9 ± 0.6 ms
Training Time: 3713 ms
Training Time: 88 ms
Training Time: 89 ms
Training Time: 116 ms
Training Time: 89 ms
Training Time: 114 ms
Training Time: 88 ms
Training Time: 89 ms
Training Time: 119 ms
Training Time: 93 ms
Training Time: 93 ms
Training Time: 116 ms
Training Time: 88 ms
Training Time: 90 ms
Training Time: 92 ms
Training Time: 89 ms
Training Time: 112 ms
Training Time: 89 ms
Training Time: 92 ms
Training Time: 88 ms
Training Time: 88 ms
Training Time: 88 ms
6.2 - training | batch=10, size=256x256: 95.7 ± 11.2 ms
7/19. VGG-16
Inference Time: 376 ms
Inference Time: 41 ms
Inference Time: 40 ms
Inference Time: 40 ms
Inference Time: 41 ms
Inference Time: 41 ms
Inference Time: 41 ms
Inference Time: 40 ms
Inference Time: 42 ms
Inference Time: 42 ms
Inference Time: 41 ms
Inference Time: 41 ms
Inference Time: 40 ms
Inference Time: 40 ms
Inference Time: 42 ms
Inference Time: 42 ms
Inference Time: 41 ms
Inference Time: 41 ms
Inference Time: 40 ms
Inference Time: 41 ms
Inference Time: 41 ms
Inference Time: 42 ms
7.1 - inference | batch=20, size=224x224: 41.0 ± 0.7 ms
Training Time: 1495 ms
Training Time: 55 ms
Training Time: 54 ms
Training Time: 55 ms
Training Time: 55 ms
Training Time: 58 ms
Training Time: 58 ms
Training Time: 56 ms
Training Time: 55 ms
Training Time: 55 ms
Training Time: 55 ms
Training Time: 56 ms
Training Time: 54 ms
Training Time: 52 ms
Training Time: 53 ms
Training Time: 53 ms
Training Time: 55 ms
Training Time: 54 ms
Training Time: 52 ms
Training Time: 52 ms
Training Time: 53 ms
Training Time: 53 ms
7.2 - training | batch=2, size=224x224: 54.4 ± 1.7 ms
8/19. SRCNN 9-5-5
Inference Time: 1446 ms
Inference Time: 42 ms
Inference Time: 39 ms
Inference Time: 42 ms
Inference Time: 37 ms
Inference Time: 44 ms
Inference Time: 34 ms
Inference Time: 38 ms
Inference Time: 39 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 58 ms
Inference Time: 57 ms
Inference Time: 58 ms
Inference Time: 58 ms
Inference Time: 54 ms
Inference Time: 54 ms
Inference Time: 56 ms
Inference Time: 55 ms
Inference Time: 62 ms
Inference Time: 58 ms
8.1 - inference | batch=10, size=512x512: 47.2 ± 9.7 ms
Inference Time: 1760 ms
Inference Time: 34 ms
Inference Time: 34 ms
Inference Time: 34 ms
Inference Time: 33 ms
Inference Time: 34 ms
Inference Time: 34 ms
Inference Time: 35 ms
Inference Time: 34 ms
Inference Time: 33 ms
Inference Time: 35 ms
Inference Time: 34 ms
Inference Time: 33 ms
Inference Time: 34 ms
Inference Time: 34 ms
Inference Time: 34 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 35 ms
Inference Time: 34 ms
Inference Time: 34 ms
Inference Time: 33 ms
8.2 - inference | batch=1, size=1536x1536: 34.1 ± 0.7 ms
Training Time: 6951 ms
Training Time: 115 ms
Training Time: 109 ms
Training Time: 110 ms
Training Time: 108 ms
Training Time: 119 ms
Training Time: 111 ms
Training Time: 114 ms
Training Time: 110 ms
Training Time: 112 ms
Training Time: 113 ms
Training Time: 111 ms
Training Time: 109 ms
Training Time: 111 ms
Training Time: 107 ms
Training Time: 110 ms
Training Time: 108 ms
Training Time: 111 ms
Training Time: 109 ms
Training Time: 111 ms
Training Time: 109 ms
Training Time: 110 ms
8.3 - training | batch=10, size=512x512: 111 ± 3 ms
9/19. VGG-19 Super-Res
Inference Time: 242 ms
Inference Time: 55 ms
Inference Time: 56 ms
Inference Time: 56 ms
Inference Time: 55 ms
Inference Time: 56 ms
Inference Time: 56 ms
Inference Time: 54 ms
Inference Time: 55 ms
Inference Time: 56 ms
Inference Time: 54 ms
Inference Time: 53 ms
Inference Time: 52 ms
Inference Time: 52 ms
Inference Time: 52 ms
Inference Time: 56 ms
Inference Time: 53 ms
Inference Time: 52 ms
Inference Time: 52 ms
Inference Time: 57 ms
Inference Time: 53 ms
Inference Time: 52 ms
9.1 - inference | batch=10, size=256x256: 54.1 ± 1.7 ms
Inference Time: 329 ms
Inference Time: 88 ms
Inference Time: 88 ms
Inference Time: 89 ms
Inference Time: 91 ms
Inference Time: 90 ms
Inference Time: 90 ms
Inference Time: 89 ms
Inference Time: 88 ms
Inference Time: 88 ms
Inference Time: 89 ms
Inference Time: 88 ms
Inference Time: 88 ms
Inference Time: 88 ms
Inference Time: 90 ms
Inference Time: 97 ms
Inference Time: 93 ms
Inference Time: 91 ms
Inference Time: 103 ms
Inference Time: 94 ms
Inference Time: 92 ms
Inference Time: 91 ms
9.2 - inference | batch=1, size=1024x1024: 90.7 ± 3.6 ms
Training Time: 1143 ms
Training Time: 110 ms
Training Time: 110 ms
Training Time: 110 ms
Training Time: 114 ms
Training Time: 109 ms
Training Time: 110 ms
Training Time: 113 ms
Training Time: 109 ms
Training Time: 110 ms
Training Time: 111 ms
Training Time: 112 ms
Training Time: 110 ms
Training Time: 110 ms
Training Time: 112 ms
Training Time: 113 ms
Training Time: 119 ms
Training Time: 110 ms
Training Time: 111 ms
Training Time: 112 ms
Training Time: 110 ms
Training Time: 109 ms
9.3 - training | batch=10, size=224x224: 111 ± 2 ms
10/19. ResNet-SRGAN
Inference Time: 784 ms
Inference Time: 78 ms
Inference Time: 76 ms
Inference Time: 83 ms
Inference Time: 80 ms
Inference Time: 74 ms
Inference Time: 71 ms
Inference Time: 74 ms
Inference Time: 66 ms
Inference Time: 80 ms
Inference Time: 98 ms
Inference Time: 107 ms
Inference Time: 96 ms
Inference Time: 76 ms
Inference Time: 69 ms
Inference Time: 77 ms
Inference Time: 76 ms
Inference Time: 78 ms
Inference Time: 72 ms
Inference Time: 76 ms
Inference Time: 69 ms
Inference Time: 75 ms
10.1 - inference | batch=10, size=512x512: 78.6 ± 9.9 ms
Inference Time: 515 ms
Inference Time: 57 ms
Inference Time: 57 ms
Inference Time: 58 ms
Inference Time: 56 ms
Inference Time: 55 ms
Inference Time: 55 ms
Inference Time: 55 ms
Inference Time: 58 ms
Inference Time: 53 ms
Inference Time: 55 ms
Inference Time: 57 ms
Inference Time: 55 ms
Inference Time: 56 ms
Inference Time: 57 ms
Inference Time: 54 ms
Inference Time: 58 ms
Inference Time: 53 ms
Inference Time: 54 ms
Inference Time: 56 ms
Inference Time: 57 ms
Inference Time: 58 ms
10.2 - inference | batch=1, size=1536x1536: 55.9 ± 1.6 ms
Training Time: 2012 ms
Training Time: 85 ms
Training Time: 89 ms
Training Time: 87 ms
Training Time: 94 ms
Training Time: 88 ms
Training Time: 85 ms
Training Time: 88 ms
Training Time: 89 ms
Training Time: 84 ms
Training Time: 88 ms
Training Time: 91 ms
Training Time: 91 ms
Training Time: 92 ms
Training Time: 89 ms
Training Time: 91 ms
Training Time: 89 ms
Training Time: 91 ms
Training Time: 91 ms
Training Time: 90 ms
Training Time: 121 ms
Training Time: 88 ms
10.3 - training | batch=5, size=512x512: 90.5 ± 7.2 ms
11/19. ResNet-DPED
Inference Time: 615 ms
Inference Time: 67 ms
Inference Time: 70 ms
Inference Time: 68 ms
Inference Time: 68 ms
Inference Time: 68 ms
Inference Time: 68 ms
Inference Time: 69 ms
Inference Time: 68 ms
Inference Time: 68 ms
Inference Time: 70 ms
Inference Time: 73 ms
Inference Time: 71 ms
Inference Time: 68 ms
Inference Time: 69 ms
Inference Time: 68 ms
Inference Time: 68 ms
Inference Time: 68 ms
Inference Time: 67 ms
Inference Time: 68 ms
Inference Time: 68 ms
Inference Time: 71 ms
11.1 - inference | batch=10, size=256x256: 68.7 ± 1.5 ms
Inference Time: 1038 ms
Inference Time: 107 ms
Inference Time: 107 ms
Inference Time: 107 ms
Inference Time: 107 ms
Inference Time: 108 ms
Inference Time: 107 ms
Inference Time: 107 ms
Inference Time: 107 ms
Inference Time: 107 ms
Inference Time: 107 ms
Inference Time: 107 ms
Inference Time: 110 ms
Inference Time: 112 ms
Inference Time: 110 ms
Inference Time: 106 ms
Inference Time: 108 ms
Inference Time: 108 ms
Inference Time: 106 ms
Inference Time: 107 ms
Inference Time: 109 ms
Inference Time: 108 ms
11.2 - inference | batch=1, size=1024x1024: 108 ± 1 ms
Training Time: 1478 ms
Training Time: 98 ms
Training Time: 99 ms
Training Time: 99 ms
Training Time: 98 ms
Training Time: 98 ms
Training Time: 99 ms
Training Time: 100 ms
Training Time: 100 ms
Training Time: 98 ms
Training Time: 98 ms
Training Time: 99 ms
Training Time: 99 ms
Training Time: 98 ms
Training Time: 100 ms
Training Time: 98 ms
Training Time: 98 ms
Training Time: 98 ms
Training Time: 99 ms
Training Time: 98 ms
Training Time: 98 ms
Training Time: 98 ms
11.3 - training | batch=15, size=128x128: 98.6 ± 0.7 ms
12/19. U-Net
Inference Time: 2703 ms
Inference Time: 110 ms
Inference Time: 116 ms
Inference Time: 111 ms
Inference Time: 113 ms
Inference Time: 113 ms
Inference Time: 113 ms
Inference Time: 110 ms
Inference Time: 111 ms
Inference Time: 115 ms
Inference Time: 111 ms
Inference Time: 110 ms
Inference Time: 111 ms
Inference Time: 111 ms
Inference Time: 111 ms
Inference Time: 111 ms
Inference Time: 111 ms
Inference Time: 112 ms
Inference Time: 111 ms
Inference Time: 111 ms
Inference Time: 111 ms
Inference Time: 111 ms
12.1 - inference | batch=4, size=512x512: 112 ± 2 ms
Inference Time: 2678 ms
Inference Time: 117 ms
Inference Time: 113 ms
Inference Time: 115 ms
Inference Time: 114 ms
Inference Time: 114 ms
Inference Time: 110 ms
Inference Time: 110 ms
Inference Time: 111 ms
Inference Time: 110 ms
Inference Time: 110 ms
Inference Time: 109 ms
Inference Time: 110 ms
Inference Time: 114 ms
Inference Time: 114 ms
Inference Time: 115 ms
Inference Time: 115 ms
Inference Time: 122 ms
Inference Time: 117 ms
Inference Time: 131 ms
Inference Time: 117 ms
Inference Time: 117 ms
12.2 - inference | batch=1, size=1024x1024: 115 ± 5 ms
Training Time: 4174 ms
Training Time: 117 ms
Training Time: 116 ms
Training Time: 116 ms
Training Time: 117 ms
Training Time: 125 ms
Training Time: 117 ms
Training Time: 116 ms
Training Time: 117 ms
Training Time: 116 ms
Training Time: 119 ms
Training Time: 124 ms
Training Time: 116 ms
Training Time: 116 ms
Training Time: 116 ms
Training Time: 117 ms
Training Time: 116 ms
Training Time: 117 ms
Training Time: 116 ms
Training Time: 117 ms
Training Time: 118 ms
Training Time: 116 ms
12.3 - training | batch=4, size=256x256: 117 ± 2 ms
13/19. Nvidia-SPADE
Inference Time: 1424 ms
Inference Time: 44 ms
Inference Time: 43 ms
Inference Time: 45 ms
Inference Time: 45 ms
Inference Time: 44 ms
Inference Time: 44 ms
Inference Time: 46 ms
Inference Time: 46 ms
Inference Time: 43 ms
Inference Time: 46 ms
Inference Time: 46 ms
Inference Time: 45 ms
Inference Time: 45 ms
Inference Time: 44 ms
Inference Time: 45 ms
Inference Time: 44 ms
Inference Time: 46 ms
Inference Time: 45 ms
Inference Time: 45 ms
Inference Time: 45 ms
Inference Time: 45 ms
13.1 - inference | batch=5, size=128x128: 44.8 ± 0.9 ms
Training Time: 4303 ms
Training Time: 75 ms
Training Time: 68 ms
Training Time: 71 ms
Training Time: 75 ms
Training Time: 76 ms
Training Time: 68 ms
Training Time: 71 ms
Training Time: 69 ms
Training Time: 70 ms
Training Time: 69 ms
Training Time: 75 ms
Training Time: 69 ms
Training Time: 81 ms
Training Time: 82 ms
Training Time: 71 ms
Training Time: 69 ms
Training Time: 69 ms
Training Time: 77 ms
Training Time: 77 ms
Training Time: 70 ms
Training Time: 77 ms
13.2 - training | batch=1, size=128x128: 72.8 ± 4.2 ms
14/19. ICNet
Inference Time: 616 ms
Inference Time: 82 ms
Inference Time: 82 ms
Inference Time: 81 ms
Inference Time: 81 ms
Inference Time: 83 ms
Inference Time: 84 ms
Inference Time: 82 ms
Inference Time: 83 ms
Inference Time: 82 ms
Inference Time: 105 ms
Inference Time: 102 ms
Inference Time: 104 ms
Inference Time: 105 ms
Inference Time: 107 ms
Inference Time: 105 ms
Inference Time: 111 ms
Inference Time: 108 ms
Inference Time: 107 ms
Inference Time: 107 ms
Inference Time: 108 ms
Inference Time: 110 ms
14.1 - inference | batch=5, size=1024x1536: 96.1 ± 12.2 ms
Training Time: 2903 ms
Training Time: 311 ms
Training Time: 306 ms
Training Time: 317 ms
Training Time: 311 ms
Training Time: 344 ms
Training Time: 397 ms
Training Time: 341 ms
Training Time: 365 ms
Training Time: 365 ms
Training Time: 352 ms
Training Time: 306 ms
Training Time: 308 ms
Training Time: 306 ms
Training Time: 311 ms
Training Time: 306 ms
Training Time: 302 ms
Training Time: 320 ms
Training Time: 310 ms
Training Time: 302 ms
Training Time: 326 ms
Training Time: 351 ms
14.2 - training | batch=10, size=1024x1536: 327 ± 26 ms
15/19. PSPNet
Inference Time: 3414 ms
Inference Time: 176 ms
Inference Time: 173 ms
Inference Time: 173 ms
Inference Time: 172 ms
Inference Time: 172 ms
Inference Time: 173 ms
Inference Time: 174 ms
Inference Time: 173 ms
Inference Time: 174 ms
Inference Time: 173 ms
Inference Time: 176 ms
Inference Time: 178 ms
Inference Time: 174 ms
Inference Time: 177 ms
Inference Time: 179 ms
Inference Time: 176 ms
Inference Time: 174 ms
Inference Time: 175 ms
Inference Time: 173 ms
Inference Time: 176 ms
Inference Time: 179 ms
15.1 - inference | batch=5, size=720x720: 175 ± 2 ms
Training Time: 3758 ms
Training Time: 72 ms
Training Time: 72 ms
Training Time: 72 ms
Training Time: 77 ms
Training Time: 73 ms
Training Time: 72 ms
Training Time: 72 ms
Training Time: 73 ms
Training Time: 72 ms
Training Time: 72 ms
Training Time: 72 ms
Training Time: 73 ms
Training Time: 74 ms
Training Time: 72 ms
Training Time: 74 ms
Training Time: 76 ms
Training Time: 75 ms
Training Time: 72 ms
Training Time: 75 ms
Training Time: 74 ms
Training Time: 74 ms
15.2 - training | batch=1, size=512x512: 73.2 ± 1.5 ms
16/19. DeepLab
Inference Time: 1116 ms
Inference Time: 51 ms
Inference Time: 50 ms
Inference Time: 49 ms
Inference Time: 50 ms
Inference Time: 52 ms
Inference Time: 50 ms
Inference Time: 50 ms
Inference Time: 49 ms
Inference Time: 50 ms
Inference Time: 51 ms
Inference Time: 51 ms
Inference Time: 50 ms
Inference Time: 51 ms
Inference Time: 50 ms
Inference Time: 51 ms
Inference Time: 50 ms
Inference Time: 50 ms
Inference Time: 50 ms
Inference Time: 50 ms
Inference Time: 50 ms
Inference Time: 50 ms
16.1 - inference | batch=2, size=512x512: 50.2 ± 0.7 ms
Training Time: 3288 ms
Training Time: 58 ms
Training Time: 88 ms
Training Time: 68 ms
Training Time: 68 ms
Training Time: 88 ms
Training Time: 59 ms
Training Time: 67 ms
Training Time: 61 ms
Training Time: 60 ms
Training Time: 59 ms
Training Time: 68 ms
Training Time: 70 ms
Training Time: 60 ms
Training Time: 58 ms
Training Time: 77 ms
Training Time: 69 ms
Training Time: 60 ms
Training Time: 59 ms
Training Time: 86 ms
Training Time: 67 ms
Training Time: 66 ms
16.2 - training | batch=1, size=384x384: 67.4 ± 9.5 ms
17/19. Pixel-RNN
Inference Time: 3993 ms
Inference Time: 1675 ms
Inference Time: 1709 ms
Inference Time: 1756 ms
Inference Time: 1630 ms
Inference Time: 1782 ms
Inference Time: 1662 ms
Inference Time: 1665 ms
Inference Time: 1793 ms
Inference Time: 1620 ms
Inference Time: 1717 ms
Inference Time: 1617 ms
Inference Time: 1742 ms
17.1 - inference | batch=50, size=64x64: 1697 ± 59 ms
Training Time: 20502 ms
Training Time: 7568 ms
Training Time: 8171 ms
Training Time: 6588 ms
Training Time: 7508 ms
17.2 - training | batch=10, size=64x64: 7459 ± 566 ms
18/19. LSTM-Sentiment
Inference Time: 1029 ms
Inference Time: 816 ms
Inference Time: 794 ms
Inference Time: 801 ms
Inference Time: 770 ms
Inference Time: 773 ms
Inference Time: 784 ms
Inference Time: 837 ms
Inference Time: 835 ms
Inference Time: 822 ms
Inference Time: 825 ms
Inference Time: 769 ms
Inference Time: 772 ms
Inference Time: 779 ms
Inference Time: 791 ms
Inference Time: 758 ms
Inference Time: 738 ms
Inference Time: 759 ms
Inference Time: 757 ms
Inference Time: 822 ms
Inference Time: 780 ms
Inference Time: 768 ms
18.1 - inference | batch=100, size=1024x300: 788 ± 28 ms
Training Time: 2808 ms
Training Time: 2808 ms
Training Time: 3105 ms
Training Time: 2882 ms
Training Time: 3195 ms
Training Time: 2605 ms
Training Time: 2594 ms
Training Time: 3228 ms
Training Time: 3112 ms
Training Time: 2561 ms
Training Time: 3042 ms
18.2 - training | batch=10, size=1024x300: 2913 ± 246 ms
19/19. GNMT-Translation
Inference Time: 502 ms
Inference Time: 315 ms
Inference Time: 316 ms
Inference Time: 310 ms
Inference Time: 290 ms
Inference Time: 314 ms
Inference Time: 311 ms
Inference Time: 283 ms
Inference Time: 292 ms
Inference Time: 287 ms
Inference Time: 287 ms
Inference Time: 284 ms
Inference Time: 314 ms
Inference Time: 289 ms
Inference Time: 288 ms
Inference Time: 309 ms
Inference Time: 294 ms
Inference Time: 314 ms
Inference Time: 280 ms
Inference Time: 309 ms
Inference Time: 297 ms
Inference Time: 281 ms
19.1 - inference | batch=1, size=1x20: 298 ± 13 ms
Device Inference Score: 18566
Device Training Score: 20064
Device AI Score: 38630
For more information and results, please visit http://ai-benchmark.com/alpha
> python3 benchmark.py
2023-09-04 15:40:05.360036: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/usr/local/lib/python3.9/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.25.2
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
2023-09-04 15:40:06.815801: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:06.828821: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:06.828887: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
>> AI-Benchmark - 0.1.3.cm
INFO:ai_benchmark:>> AI-Benchmark - 0.1.3.cm
>> Let the AI Games begin
INFO:ai_benchmark:>> Let the AI Games begin
2023-09-04 15:40:06.981612: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:06.981719: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:06.981765: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:06.981918: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:06.981974: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:06.982028: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:06.982056: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:40:07.179161: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:07.179269: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:07.179308: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:07.179366: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:07.179409: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:07.179432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:40:07.179507: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:07.179563: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:07.179599: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:07.179645: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:07.179684: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:07.179704: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
* TF Version: 2.15.0
INFO:ai_benchmark:* TF Version: 2.15.0
* Platform: Linux-5.15.0-82-generic-x86_64-with-glibc2.17
INFO:ai_benchmark:* Platform: Linux-5.15.0-82-generic-x86_64-with-glibc2.17
* CPU: AMD Ryzen 9 3900X 12-Core Processor
INFO:ai_benchmark:* CPU: AMD Ryzen 9 3900X 12-Core Processor
* CPU RAM: 31 GB
INFO:ai_benchmark:* CPU RAM: 31 GB
* GPU/0: Radeon RX 7900 XTX
INFO:ai_benchmark:* GPU/0: Radeon RX 7900 XTX
* GPU RAM: 23.5 GB
INFO:ai_benchmark:* GPU RAM: 23.5 GB
* CUDA Version: N/A
INFO:ai_benchmark:* CUDA Version: N/A
* CUDA Build: N/A
INFO:ai_benchmark:* CUDA Build: N/A
The benchmark is running...
WARNING:ai_benchmark:The benchmark is running...
The tests might take up to 20 minutes
WARNING:ai_benchmark:The tests might take up to 20 minutes
Please don't interrupt the script
WARNING:ai_benchmark:Please don't interrupt the script
1/19. MobileNet-V2
INFO:ai_benchmark:
1/19. MobileNet-V2
2023-09-04 15:40:08.470243: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:08.470356: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:08.470392: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:08.470456: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:08.470497: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:08.470520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:40:08.736260: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:386] MLIR V1 optimization pass is not enabled
2023-09-04 15:40:08.959779: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:40:09.542531: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 229 ms
DEBUG:ai_benchmark:Inference Time: 229 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
1.1 - inference | batch=50, size=224x224: 32.9 ± 0.5 ms
INFO:ai_benchmark:1.1 - inference | batch=50, size=224x224: 32.9 ± 0.5 ms
2023-09-04 15:40:15.501624: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:40:16.260700: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 961 ms
DEBUG:ai_benchmark:Training Time: 961 ms
Training Time: 372 ms
DEBUG:ai_benchmark:Training Time: 372 ms
Training Time: 370 ms
DEBUG:ai_benchmark:Training Time: 370 ms
Training Time: 367 ms
DEBUG:ai_benchmark:Training Time: 367 ms
Training Time: 369 ms
DEBUG:ai_benchmark:Training Time: 369 ms
Training Time: 364 ms
DEBUG:ai_benchmark:Training Time: 364 ms
Training Time: 365 ms
DEBUG:ai_benchmark:Training Time: 365 ms
Training Time: 366 ms
DEBUG:ai_benchmark:Training Time: 366 ms
Training Time: 362 ms
DEBUG:ai_benchmark:Training Time: 362 ms
Training Time: 362 ms
DEBUG:ai_benchmark:Training Time: 362 ms
Training Time: 365 ms
DEBUG:ai_benchmark:Training Time: 365 ms
Training Time: 362 ms
DEBUG:ai_benchmark:Training Time: 362 ms
Training Time: 366 ms
DEBUG:ai_benchmark:Training Time: 366 ms
Training Time: 363 ms
DEBUG:ai_benchmark:Training Time: 363 ms
Training Time: 366 ms
DEBUG:ai_benchmark:Training Time: 366 ms
Training Time: 361 ms
DEBUG:ai_benchmark:Training Time: 361 ms
Training Time: 363 ms
DEBUG:ai_benchmark:Training Time: 363 ms
Training Time: 361 ms
DEBUG:ai_benchmark:Training Time: 361 ms
Training Time: 363 ms
DEBUG:ai_benchmark:Training Time: 363 ms
Training Time: 361 ms
DEBUG:ai_benchmark:Training Time: 361 ms
Training Time: 363 ms
DEBUG:ai_benchmark:Training Time: 363 ms
Training Time: 363 ms
DEBUG:ai_benchmark:Training Time: 363 ms
1.2 - training | batch=50, size=224x224: 364 ± 3 ms
INFO:ai_benchmark:1.2 - training | batch=50, size=224x224: 364 ± 3 ms
2/19. Inception-V3
INFO:ai_benchmark:
2/19. Inception-V3
2023-09-04 15:40:28.098164: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:28.098279: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:28.098317: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:28.098380: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:28.098448: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:28.098477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:40:28.884284: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:40:29.284076: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 323 ms
DEBUG:ai_benchmark:Inference Time: 323 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 31 ms
DEBUG:ai_benchmark:Inference Time: 31 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 35 ms
DEBUG:ai_benchmark:Inference Time: 35 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 33 ms
DEBUG:ai_benchmark:Inference Time: 33 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 35 ms
DEBUG:ai_benchmark:Inference Time: 35 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 31 ms
DEBUG:ai_benchmark:Inference Time: 31 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 35 ms
DEBUG:ai_benchmark:Inference Time: 35 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 35 ms
DEBUG:ai_benchmark:Inference Time: 35 ms
2.1 - inference | batch=20, size=346x346: 33.2 ± 1.4 ms
INFO:ai_benchmark:2.1 - inference | batch=20, size=346x346: 33.2 ± 1.4 ms
2023-09-04 15:40:33.857562: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:40:35.007305: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 1413 ms
DEBUG:ai_benchmark:Training Time: 1413 ms
Training Time: 407 ms
DEBUG:ai_benchmark:Training Time: 407 ms
Training Time: 404 ms
DEBUG:ai_benchmark:Training Time: 404 ms
Training Time: 401 ms
DEBUG:ai_benchmark:Training Time: 401 ms
Training Time: 400 ms
DEBUG:ai_benchmark:Training Time: 400 ms
Training Time: 410 ms
DEBUG:ai_benchmark:Training Time: 410 ms
Training Time: 405 ms
DEBUG:ai_benchmark:Training Time: 405 ms
Training Time: 402 ms
DEBUG:ai_benchmark:Training Time: 402 ms
Training Time: 400 ms
DEBUG:ai_benchmark:Training Time: 400 ms
Training Time: 405 ms
DEBUG:ai_benchmark:Training Time: 405 ms
Training Time: 407 ms
DEBUG:ai_benchmark:Training Time: 407 ms
Training Time: 398 ms
DEBUG:ai_benchmark:Training Time: 398 ms
Training Time: 398 ms
DEBUG:ai_benchmark:Training Time: 398 ms
Training Time: 403 ms
DEBUG:ai_benchmark:Training Time: 403 ms
Training Time: 403 ms
DEBUG:ai_benchmark:Training Time: 403 ms
Training Time: 402 ms
DEBUG:ai_benchmark:Training Time: 402 ms
Training Time: 400 ms
DEBUG:ai_benchmark:Training Time: 400 ms
Training Time: 398 ms
DEBUG:ai_benchmark:Training Time: 398 ms
Training Time: 403 ms
DEBUG:ai_benchmark:Training Time: 403 ms
Training Time: 407 ms
DEBUG:ai_benchmark:Training Time: 407 ms
Training Time: 402 ms
DEBUG:ai_benchmark:Training Time: 402 ms
Training Time: 401 ms
DEBUG:ai_benchmark:Training Time: 401 ms
2.2 - training | batch=20, size=346x346: 403 ± 3 ms
INFO:ai_benchmark:2.2 - training | batch=20, size=346x346: 403 ± 3 ms
3/19. Inception-V4
INFO:ai_benchmark:
3/19. Inception-V4
2023-09-04 15:40:45.852720: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:45.852835: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:45.852874: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:45.852936: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:45.852979: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:40:45.853003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:40:47.015716: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:40:47.577671: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 481 ms
DEBUG:ai_benchmark:Inference Time: 481 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
3.1 - inference | batch=10, size=346x346: 36.9 ± 1.9 ms
INFO:ai_benchmark:3.1 - inference | batch=10, size=346x346: 36.9 ± 1.9 ms
2023-09-04 15:40:52.700491: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:40:54.700566: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 2168 ms
DEBUG:ai_benchmark:Training Time: 2168 ms
Training Time: 315 ms
DEBUG:ai_benchmark:Training Time: 315 ms
Training Time: 320 ms
DEBUG:ai_benchmark:Training Time: 320 ms
Training Time: 318 ms
DEBUG:ai_benchmark:Training Time: 318 ms
Training Time: 318 ms
DEBUG:ai_benchmark:Training Time: 318 ms
Training Time: 319 ms
DEBUG:ai_benchmark:Training Time: 319 ms
Training Time: 320 ms
DEBUG:ai_benchmark:Training Time: 320 ms
Training Time: 318 ms
DEBUG:ai_benchmark:Training Time: 318 ms
Training Time: 319 ms
DEBUG:ai_benchmark:Training Time: 319 ms
Training Time: 320 ms
DEBUG:ai_benchmark:Training Time: 320 ms
Training Time: 319 ms
DEBUG:ai_benchmark:Training Time: 319 ms
Training Time: 319 ms
DEBUG:ai_benchmark:Training Time: 319 ms
Training Time: 318 ms
DEBUG:ai_benchmark:Training Time: 318 ms
Training Time: 320 ms
DEBUG:ai_benchmark:Training Time: 320 ms
Training Time: 320 ms
DEBUG:ai_benchmark:Training Time: 320 ms
Training Time: 319 ms
DEBUG:ai_benchmark:Training Time: 319 ms
Training Time: 319 ms
DEBUG:ai_benchmark:Training Time: 319 ms
Training Time: 319 ms
DEBUG:ai_benchmark:Training Time: 319 ms
Training Time: 319 ms
DEBUG:ai_benchmark:Training Time: 319 ms
Training Time: 318 ms
DEBUG:ai_benchmark:Training Time: 318 ms
Training Time: 319 ms
DEBUG:ai_benchmark:Training Time: 319 ms
Training Time: 320 ms
DEBUG:ai_benchmark:Training Time: 320 ms
3.2 - training | batch=10, size=346x346: 319 ± 1 ms
INFO:ai_benchmark:3.2 - training | batch=10, size=346x346: 319 ± 1 ms
4/19. Inception-ResNet-V2
INFO:ai_benchmark:
4/19. Inception-ResNet-V2
2023-09-04 15:41:02.819883: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:02.820008: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:02.820046: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:02.820112: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:02.820155: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:02.820179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:41:04.929646: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:41:05.882172: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 983 ms
DEBUG:ai_benchmark:Inference Time: 983 ms
Inference Time: 47 ms
DEBUG:ai_benchmark:Inference Time: 47 ms
Inference Time: 51 ms
DEBUG:ai_benchmark:Inference Time: 51 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 44 ms
DEBUG:ai_benchmark:Inference Time: 44 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 44 ms
DEBUG:ai_benchmark:Inference Time: 44 ms
Inference Time: 46 ms
DEBUG:ai_benchmark:Inference Time: 46 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 44 ms
DEBUG:ai_benchmark:Inference Time: 44 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 44 ms
DEBUG:ai_benchmark:Inference Time: 44 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 43 ms
DEBUG:ai_benchmark:Inference Time: 43 ms
Inference Time: 46 ms
DEBUG:ai_benchmark:Inference Time: 46 ms
Inference Time: 43 ms
DEBUG:ai_benchmark:Inference Time: 43 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 44 ms
DEBUG:ai_benchmark:Inference Time: 44 ms
Inference Time: 46 ms
DEBUG:ai_benchmark:Inference Time: 46 ms
Inference Time: 44 ms
DEBUG:ai_benchmark:Inference Time: 44 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
4.1 - inference | batch=10, size=346x346: 45.0 ± 1.6 ms
INFO:ai_benchmark:4.1 - inference | batch=10, size=346x346: 45.0 ± 1.6 ms
2023-09-04 15:41:13.756770: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:41:17.684396: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 4046 ms
DEBUG:ai_benchmark:Training Time: 4046 ms
Training Time: 277 ms
DEBUG:ai_benchmark:Training Time: 277 ms
Training Time: 284 ms
DEBUG:ai_benchmark:Training Time: 284 ms
Training Time: 277 ms
DEBUG:ai_benchmark:Training Time: 277 ms
Training Time: 278 ms
DEBUG:ai_benchmark:Training Time: 278 ms
Training Time: 275 ms
DEBUG:ai_benchmark:Training Time: 275 ms
Training Time: 277 ms
DEBUG:ai_benchmark:Training Time: 277 ms
Training Time: 276 ms
DEBUG:ai_benchmark:Training Time: 276 ms
Training Time: 275 ms
DEBUG:ai_benchmark:Training Time: 275 ms
Training Time: 275 ms
DEBUG:ai_benchmark:Training Time: 275 ms
Training Time: 277 ms
DEBUG:ai_benchmark:Training Time: 277 ms
Training Time: 277 ms
DEBUG:ai_benchmark:Training Time: 277 ms
Training Time: 276 ms
DEBUG:ai_benchmark:Training Time: 276 ms
Training Time: 274 ms
DEBUG:ai_benchmark:Training Time: 274 ms
Training Time: 274 ms
DEBUG:ai_benchmark:Training Time: 274 ms
Training Time: 274 ms
DEBUG:ai_benchmark:Training Time: 274 ms
Training Time: 276 ms
DEBUG:ai_benchmark:Training Time: 276 ms
Training Time: 275 ms
DEBUG:ai_benchmark:Training Time: 275 ms
Training Time: 277 ms
DEBUG:ai_benchmark:Training Time: 277 ms
Training Time: 275 ms
DEBUG:ai_benchmark:Training Time: 275 ms
Training Time: 276 ms
DEBUG:ai_benchmark:Training Time: 276 ms
Training Time: 278 ms
DEBUG:ai_benchmark:Training Time: 278 ms
4.2 - training | batch=8, size=346x346: 276 ± 2 ms
INFO:ai_benchmark:4.2 - training | batch=8, size=346x346: 276 ± 2 ms
5/19. ResNet-V2-50
INFO:ai_benchmark:
5/19. ResNet-V2-50
2023-09-04 15:41:24.771442: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:24.771562: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:24.771602: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:24.771667: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:24.771711: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:24.771737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:41:25.351442: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:41:25.639592: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 259 ms
DEBUG:ai_benchmark:Inference Time: 259 ms
Inference Time: 30 ms
DEBUG:ai_benchmark:Inference Time: 30 ms
Inference Time: 23 ms
DEBUG:ai_benchmark:Inference Time: 23 ms
Inference Time: 25 ms
DEBUG:ai_benchmark:Inference Time: 25 ms
Inference Time: 24 ms
DEBUG:ai_benchmark:Inference Time: 24 ms
Inference Time: 25 ms
DEBUG:ai_benchmark:Inference Time: 25 ms
Inference Time: 23 ms
DEBUG:ai_benchmark:Inference Time: 23 ms
Inference Time: 24 ms
DEBUG:ai_benchmark:Inference Time: 24 ms
Inference Time: 23 ms
DEBUG:ai_benchmark:Inference Time: 23 ms
Inference Time: 25 ms
DEBUG:ai_benchmark:Inference Time: 25 ms
Inference Time: 23 ms
DEBUG:ai_benchmark:Inference Time: 23 ms
Inference Time: 24 ms
DEBUG:ai_benchmark:Inference Time: 24 ms
Inference Time: 24 ms
DEBUG:ai_benchmark:Inference Time: 24 ms
Inference Time: 25 ms
DEBUG:ai_benchmark:Inference Time: 25 ms
Inference Time: 23 ms
DEBUG:ai_benchmark:Inference Time: 23 ms
Inference Time: 24 ms
DEBUG:ai_benchmark:Inference Time: 24 ms
Inference Time: 23 ms
DEBUG:ai_benchmark:Inference Time: 23 ms
Inference Time: 25 ms
DEBUG:ai_benchmark:Inference Time: 25 ms
Inference Time: 23 ms
DEBUG:ai_benchmark:Inference Time: 23 ms
Inference Time: 24 ms
DEBUG:ai_benchmark:Inference Time: 24 ms
Inference Time: 23 ms
DEBUG:ai_benchmark:Inference Time: 23 ms
Inference Time: 24 ms
DEBUG:ai_benchmark:Inference Time: 24 ms
5.1 - inference | batch=10, size=346x346: 24.1 ± 1.5 ms
INFO:ai_benchmark:5.1 - inference | batch=10, size=346x346: 24.1 ± 1.5 ms
2023-09-04 15:41:28.815530: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:41:29.606853: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 815 ms
DEBUG:ai_benchmark:Training Time: 815 ms
Training Time: 90 ms
DEBUG:ai_benchmark:Training Time: 90 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
Training Time: 90 ms
DEBUG:ai_benchmark:Training Time: 90 ms
Training Time: 88 ms
DEBUG:ai_benchmark:Training Time: 88 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
Training Time: 88 ms
DEBUG:ai_benchmark:Training Time: 88 ms
Training Time: 90 ms
DEBUG:ai_benchmark:Training Time: 90 ms
Training Time: 88 ms
DEBUG:ai_benchmark:Training Time: 88 ms
Training Time: 90 ms
DEBUG:ai_benchmark:Training Time: 90 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
Training Time: 90 ms
DEBUG:ai_benchmark:Training Time: 90 ms
Training Time: 90 ms
DEBUG:ai_benchmark:Training Time: 90 ms
Training Time: 90 ms
DEBUG:ai_benchmark:Training Time: 90 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
Training Time: 91 ms
DEBUG:ai_benchmark:Training Time: 91 ms
Training Time: 88 ms
DEBUG:ai_benchmark:Training Time: 88 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
5.2 - training | batch=10, size=346x346: 89.2 ± 0.8 ms
INFO:ai_benchmark:5.2 - training | batch=10, size=346x346: 89.2 ± 0.8 ms
6/19. ResNet-V2-152
INFO:ai_benchmark:
6/19. ResNet-V2-152
2023-09-04 15:41:32.605342: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:32.606920: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:32.607007: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:32.607080: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:32.607125: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:32.607151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:41:34.775272: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:41:35.540567: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 674 ms
DEBUG:ai_benchmark:Inference Time: 674 ms
Inference Time: 35 ms
DEBUG:ai_benchmark:Inference Time: 35 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 35 ms
DEBUG:ai_benchmark:Inference Time: 35 ms
Inference Time: 35 ms
DEBUG:ai_benchmark:Inference Time: 35 ms
Inference Time: 35 ms
DEBUG:ai_benchmark:Inference Time: 35 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 35 ms
DEBUG:ai_benchmark:Inference Time: 35 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 35 ms
DEBUG:ai_benchmark:Inference Time: 35 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 35 ms
DEBUG:ai_benchmark:Inference Time: 35 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
Inference Time: 35 ms
DEBUG:ai_benchmark:Inference Time: 35 ms
Inference Time: 34 ms
DEBUG:ai_benchmark:Inference Time: 34 ms
6.1 - inference | batch=10, size=256x256: 34.5 ± 0.6 ms
INFO:ai_benchmark:6.1 - inference | batch=10, size=256x256: 34.5 ± 0.6 ms
2023-09-04 15:41:42.303121: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:41:45.024722: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 2648 ms
DEBUG:ai_benchmark:Training Time: 2648 ms
Training Time: 134 ms
DEBUG:ai_benchmark:Training Time: 134 ms
Training Time: 133 ms
DEBUG:ai_benchmark:Training Time: 133 ms
Training Time: 134 ms
DEBUG:ai_benchmark:Training Time: 134 ms
Training Time: 133 ms
DEBUG:ai_benchmark:Training Time: 133 ms
Training Time: 133 ms
DEBUG:ai_benchmark:Training Time: 133 ms
Training Time: 134 ms
DEBUG:ai_benchmark:Training Time: 134 ms
Training Time: 134 ms
DEBUG:ai_benchmark:Training Time: 134 ms
Training Time: 134 ms
DEBUG:ai_benchmark:Training Time: 134 ms
Training Time: 134 ms
DEBUG:ai_benchmark:Training Time: 134 ms
Training Time: 133 ms
DEBUG:ai_benchmark:Training Time: 133 ms
Training Time: 133 ms
DEBUG:ai_benchmark:Training Time: 133 ms
Training Time: 133 ms
DEBUG:ai_benchmark:Training Time: 133 ms
Training Time: 133 ms
DEBUG:ai_benchmark:Training Time: 133 ms
Training Time: 133 ms
DEBUG:ai_benchmark:Training Time: 133 ms
Training Time: 133 ms
DEBUG:ai_benchmark:Training Time: 133 ms
Training Time: 134 ms
DEBUG:ai_benchmark:Training Time: 134 ms
Training Time: 134 ms
DEBUG:ai_benchmark:Training Time: 134 ms
Training Time: 134 ms
DEBUG:ai_benchmark:Training Time: 134 ms
Training Time: 134 ms
DEBUG:ai_benchmark:Training Time: 134 ms
Training Time: 134 ms
DEBUG:ai_benchmark:Training Time: 134 ms
Training Time: 134 ms
DEBUG:ai_benchmark:Training Time: 134 ms
6.2 - training | batch=10, size=256x256: 133.6 ± 0.5 ms
INFO:ai_benchmark:6.2 - training | batch=10, size=256x256: 133.6 ± 0.5 ms
7/19. VGG-16
INFO:ai_benchmark:
7/19. VGG-16
2023-09-04 15:41:48.974847: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:48.974977: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:48.975018: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:48.975083: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:48.975129: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:48.975155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:41:49.079823: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:41:49.205783: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 118 ms
DEBUG:ai_benchmark:Inference Time: 118 ms
Inference Time: 50 ms
DEBUG:ai_benchmark:Inference Time: 50 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 50 ms
DEBUG:ai_benchmark:Inference Time: 50 ms
Inference Time: 51 ms
DEBUG:ai_benchmark:Inference Time: 51 ms
Inference Time: 50 ms
DEBUG:ai_benchmark:Inference Time: 50 ms
Inference Time: 51 ms
DEBUG:ai_benchmark:Inference Time: 51 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 50 ms
DEBUG:ai_benchmark:Inference Time: 50 ms
Inference Time: 50 ms
DEBUG:ai_benchmark:Inference Time: 50 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 50 ms
DEBUG:ai_benchmark:Inference Time: 50 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 52 ms
DEBUG:ai_benchmark:Inference Time: 52 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 50 ms
DEBUG:ai_benchmark:Inference Time: 50 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 50 ms
DEBUG:ai_benchmark:Inference Time: 50 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 54 ms
DEBUG:ai_benchmark:Inference Time: 54 ms
7.1 - inference | batch=20, size=224x224: 49.8 ± 1.4 ms
INFO:ai_benchmark:7.1 - inference | batch=20, size=224x224: 49.8 ± 1.4 ms
2023-09-04 15:41:52.221623: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:41:52.847856: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 232 ms
DEBUG:ai_benchmark:Training Time: 232 ms
Training Time: 90 ms
DEBUG:ai_benchmark:Training Time: 90 ms
Training Time: 91 ms
DEBUG:ai_benchmark:Training Time: 91 ms
Training Time: 91 ms
DEBUG:ai_benchmark:Training Time: 91 ms
Training Time: 90 ms
DEBUG:ai_benchmark:Training Time: 90 ms
Training Time: 86 ms
DEBUG:ai_benchmark:Training Time: 86 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
Training Time: 87 ms
DEBUG:ai_benchmark:Training Time: 87 ms
Training Time: 91 ms
DEBUG:ai_benchmark:Training Time: 91 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
Training Time: 87 ms
DEBUG:ai_benchmark:Training Time: 87 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
Training Time: 91 ms
DEBUG:ai_benchmark:Training Time: 91 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
Training Time: 87 ms
DEBUG:ai_benchmark:Training Time: 87 ms
Training Time: 92 ms
DEBUG:ai_benchmark:Training Time: 92 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
Training Time: 91 ms
DEBUG:ai_benchmark:Training Time: 91 ms
Training Time: 89 ms
DEBUG:ai_benchmark:Training Time: 89 ms
Training Time: 90 ms
DEBUG:ai_benchmark:Training Time: 90 ms
Training Time: 90 ms
DEBUG:ai_benchmark:Training Time: 90 ms
Training Time: 90 ms
DEBUG:ai_benchmark:Training Time: 90 ms
7.2 - training | batch=2, size=224x224: 89.4 ± 1.6 ms
INFO:ai_benchmark:7.2 - training | batch=2, size=224x224: 89.4 ± 1.6 ms
8/19. SRCNN 9-5-5
INFO:ai_benchmark:
8/19. SRCNN 9-5-5
2023-09-04 15:41:55.001108: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:55.001252: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:55.001309: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:55.001395: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:55.001452: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:41:55.001486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:41:55.020559: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:41:55.205874: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 76 ms
DEBUG:ai_benchmark:Inference Time: 76 ms
Inference Time: 39 ms
DEBUG:ai_benchmark:Inference Time: 39 ms
Inference Time: 31 ms
DEBUG:ai_benchmark:Inference Time: 31 ms
Inference Time: 31 ms
DEBUG:ai_benchmark:Inference Time: 31 ms
Inference Time: 31 ms
DEBUG:ai_benchmark:Inference Time: 31 ms
Inference Time: 30 ms
DEBUG:ai_benchmark:Inference Time: 30 ms
Inference Time: 31 ms
DEBUG:ai_benchmark:Inference Time: 31 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 31 ms
DEBUG:ai_benchmark:Inference Time: 31 ms
Inference Time: 30 ms
DEBUG:ai_benchmark:Inference Time: 30 ms
Inference Time: 31 ms
DEBUG:ai_benchmark:Inference Time: 31 ms
Inference Time: 31 ms
DEBUG:ai_benchmark:Inference Time: 31 ms
Inference Time: 31 ms
DEBUG:ai_benchmark:Inference Time: 31 ms
Inference Time: 31 ms
DEBUG:ai_benchmark:Inference Time: 31 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 30 ms
DEBUG:ai_benchmark:Inference Time: 30 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 31 ms
DEBUG:ai_benchmark:Inference Time: 31 ms
Inference Time: 31 ms
DEBUG:ai_benchmark:Inference Time: 31 ms
Inference Time: 32 ms
DEBUG:ai_benchmark:Inference Time: 32 ms
Inference Time: 31 ms
DEBUG:ai_benchmark:Inference Time: 31 ms
8.1 - inference | batch=10, size=512x512: 31.5 ± 1.8 ms
INFO:ai_benchmark:8.1 - inference | batch=10, size=512x512: 31.5 ± 1.8 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
Inference Time: 25 ms
DEBUG:ai_benchmark:Inference Time: 25 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
Inference Time: 25 ms
DEBUG:ai_benchmark:Inference Time: 25 ms
Inference Time: 27 ms
DEBUG:ai_benchmark:Inference Time: 27 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
Inference Time: 27 ms
DEBUG:ai_benchmark:Inference Time: 27 ms
Inference Time: 27 ms
DEBUG:ai_benchmark:Inference Time: 27 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
Inference Time: 27 ms
DEBUG:ai_benchmark:Inference Time: 27 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
Inference Time: 27 ms
DEBUG:ai_benchmark:Inference Time: 27 ms
Inference Time: 26 ms
DEBUG:ai_benchmark:Inference Time: 26 ms
8.2 - inference | batch=1, size=1536x1536: 26.1 ± 0.6 ms
INFO:ai_benchmark:8.2 - inference | batch=1, size=1536x1536: 26.1 ± 0.6 ms
2023-09-04 15:42:00.974126: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:42:01.530477: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 272 ms
DEBUG:ai_benchmark:Training Time: 272 ms
Training Time: 195 ms
DEBUG:ai_benchmark:Training Time: 195 ms
Training Time: 193 ms
DEBUG:ai_benchmark:Training Time: 193 ms
Training Time: 191 ms
DEBUG:ai_benchmark:Training Time: 191 ms
Training Time: 193 ms
DEBUG:ai_benchmark:Training Time: 193 ms
Training Time: 195 ms
DEBUG:ai_benchmark:Training Time: 195 ms
Training Time: 193 ms
DEBUG:ai_benchmark:Training Time: 193 ms
Training Time: 192 ms
DEBUG:ai_benchmark:Training Time: 192 ms
Training Time: 195 ms
DEBUG:ai_benchmark:Training Time: 195 ms
Training Time: 191 ms
DEBUG:ai_benchmark:Training Time: 191 ms
Training Time: 194 ms
DEBUG:ai_benchmark:Training Time: 194 ms
Training Time: 192 ms
DEBUG:ai_benchmark:Training Time: 192 ms
Training Time: 191 ms
DEBUG:ai_benchmark:Training Time: 191 ms
Training Time: 189 ms
DEBUG:ai_benchmark:Training Time: 189 ms
Training Time: 196 ms
DEBUG:ai_benchmark:Training Time: 196 ms
Training Time: 191 ms
DEBUG:ai_benchmark:Training Time: 191 ms
Training Time: 188 ms
DEBUG:ai_benchmark:Training Time: 188 ms
Training Time: 191 ms
DEBUG:ai_benchmark:Training Time: 191 ms
Training Time: 193 ms
DEBUG:ai_benchmark:Training Time: 193 ms
Training Time: 190 ms
DEBUG:ai_benchmark:Training Time: 190 ms
Training Time: 193 ms
DEBUG:ai_benchmark:Training Time: 193 ms
Training Time: 192 ms
DEBUG:ai_benchmark:Training Time: 192 ms
8.3 - training | batch=10, size=512x512: 192 ± 2 ms
INFO:ai_benchmark:8.3 - training | batch=10, size=512x512: 192 ± 2 ms
9/19. VGG-19 Super-Res
INFO:ai_benchmark:
9/19. VGG-19 Super-Res
2023-09-04 15:42:15.529053: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:15.529191: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:15.529246: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:15.529321: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:15.529369: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:15.529390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:42:15.659972: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:42:15.838272: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 106 ms
DEBUG:ai_benchmark:Inference Time: 106 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 36 ms
DEBUG:ai_benchmark:Inference Time: 36 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
9.1 - inference | batch=10, size=256x256: 36.7 ± 0.5 ms
INFO:ai_benchmark:9.1 - inference | batch=10, size=256x256: 36.7 ± 0.5 ms
Inference Time: 59 ms
DEBUG:ai_benchmark:Inference Time: 59 ms
Inference Time: 59 ms
DEBUG:ai_benchmark:Inference Time: 59 ms
Inference Time: 59 ms
DEBUG:ai_benchmark:Inference Time: 59 ms
Inference Time: 58 ms
DEBUG:ai_benchmark:Inference Time: 58 ms
Inference Time: 60 ms
DEBUG:ai_benchmark:Inference Time: 60 ms
Inference Time: 59 ms
DEBUG:ai_benchmark:Inference Time: 59 ms
Inference Time: 59 ms
DEBUG:ai_benchmark:Inference Time: 59 ms
Inference Time: 59 ms
DEBUG:ai_benchmark:Inference Time: 59 ms
Inference Time: 60 ms
DEBUG:ai_benchmark:Inference Time: 60 ms
Inference Time: 59 ms
DEBUG:ai_benchmark:Inference Time: 59 ms
Inference Time: 59 ms
DEBUG:ai_benchmark:Inference Time: 59 ms
Inference Time: 59 ms
DEBUG:ai_benchmark:Inference Time: 59 ms
Inference Time: 59 ms
DEBUG:ai_benchmark:Inference Time: 59 ms
Inference Time: 58 ms
DEBUG:ai_benchmark:Inference Time: 58 ms
Inference Time: 60 ms
DEBUG:ai_benchmark:Inference Time: 60 ms
Inference Time: 61 ms
DEBUG:ai_benchmark:Inference Time: 61 ms
Inference Time: 60 ms
DEBUG:ai_benchmark:Inference Time: 60 ms
Inference Time: 59 ms
DEBUG:ai_benchmark:Inference Time: 59 ms
Inference Time: 59 ms
DEBUG:ai_benchmark:Inference Time: 59 ms
Inference Time: 59 ms
DEBUG:ai_benchmark:Inference Time: 59 ms
Inference Time: 59 ms
DEBUG:ai_benchmark:Inference Time: 59 ms
Inference Time: 58 ms
DEBUG:ai_benchmark:Inference Time: 58 ms
9.2 - inference | batch=1, size=1024x1024: 59.1 ± 0.7 ms
INFO:ai_benchmark:9.2 - inference | batch=1, size=1024x1024: 59.1 ± 0.7 ms
2023-09-04 15:42:21.519407: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:42:22.034918: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 374 ms
DEBUG:ai_benchmark:Training Time: 374 ms
Training Time: 223 ms
DEBUG:ai_benchmark:Training Time: 223 ms
Training Time: 219 ms
DEBUG:ai_benchmark:Training Time: 219 ms
Training Time: 220 ms
DEBUG:ai_benchmark:Training Time: 220 ms
Training Time: 222 ms
DEBUG:ai_benchmark:Training Time: 222 ms
Training Time: 219 ms
DEBUG:ai_benchmark:Training Time: 219 ms
Training Time: 220 ms
DEBUG:ai_benchmark:Training Time: 220 ms
Training Time: 221 ms
DEBUG:ai_benchmark:Training Time: 221 ms
Training Time: 222 ms
DEBUG:ai_benchmark:Training Time: 222 ms
Training Time: 221 ms
DEBUG:ai_benchmark:Training Time: 221 ms
Training Time: 221 ms
DEBUG:ai_benchmark:Training Time: 221 ms
Training Time: 222 ms
DEBUG:ai_benchmark:Training Time: 222 ms
Training Time: 220 ms
DEBUG:ai_benchmark:Training Time: 220 ms
Training Time: 221 ms
DEBUG:ai_benchmark:Training Time: 221 ms
Training Time: 225 ms
DEBUG:ai_benchmark:Training Time: 225 ms
Training Time: 217 ms
DEBUG:ai_benchmark:Training Time: 217 ms
Training Time: 220 ms
DEBUG:ai_benchmark:Training Time: 220 ms
Training Time: 220 ms
DEBUG:ai_benchmark:Training Time: 220 ms
Training Time: 220 ms
DEBUG:ai_benchmark:Training Time: 220 ms
Training Time: 220 ms
DEBUG:ai_benchmark:Training Time: 220 ms
Training Time: 220 ms
DEBUG:ai_benchmark:Training Time: 220 ms
Training Time: 222 ms
DEBUG:ai_benchmark:Training Time: 222 ms
9.3 - training | batch=10, size=224x224: 221 ± 2 ms
INFO:ai_benchmark:9.3 - training | batch=10, size=224x224: 221 ± 2 ms
10/19. ResNet-SRGAN
INFO:ai_benchmark:
10/19. ResNet-SRGAN
2023-09-04 15:42:34.314120: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:34.314257: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:34.314312: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:34.314394: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:34.314449: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:34.314483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:42:34.749338: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:42:35.127095: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 415 ms
DEBUG:ai_benchmark:Inference Time: 415 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 44 ms
DEBUG:ai_benchmark:Inference Time: 44 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 44 ms
DEBUG:ai_benchmark:Inference Time: 44 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 46 ms
DEBUG:ai_benchmark:Inference Time: 46 ms
Inference Time: 44 ms
DEBUG:ai_benchmark:Inference Time: 44 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 44 ms
DEBUG:ai_benchmark:Inference Time: 44 ms
Inference Time: 46 ms
DEBUG:ai_benchmark:Inference Time: 46 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 44 ms
DEBUG:ai_benchmark:Inference Time: 44 ms
Inference Time: 44 ms
DEBUG:ai_benchmark:Inference Time: 44 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
Inference Time: 45 ms
DEBUG:ai_benchmark:Inference Time: 45 ms
10.1 - inference | batch=10, size=512x512: 44.8 ± 0.6 ms
INFO:ai_benchmark:10.1 - inference | batch=10, size=512x512: 44.8 ± 0.6 ms
Inference Time: 40 ms
DEBUG:ai_benchmark:Inference Time: 40 ms
Inference Time: 37 ms
DEBUG:ai_benchmark:Inference Time: 37 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 39 ms
DEBUG:ai_benchmark:Inference Time: 39 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 39 ms
DEBUG:ai_benchmark:Inference Time: 39 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 39 ms
DEBUG:ai_benchmark:Inference Time: 39 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 39 ms
DEBUG:ai_benchmark:Inference Time: 39 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
Inference Time: 38 ms
DEBUG:ai_benchmark:Inference Time: 38 ms
10.2 - inference | batch=1, size=1536x1536: 38.1 ± 0.5 ms
INFO:ai_benchmark:10.2 - inference | batch=1, size=1536x1536: 38.1 ± 0.5 ms
2023-09-04 15:42:42.313045: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:42:43.081593: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 681 ms
DEBUG:ai_benchmark:Training Time: 681 ms
Training Time: 133 ms
DEBUG:ai_benchmark:Training Time: 133 ms
Training Time: 132 ms
DEBUG:ai_benchmark:Training Time: 132 ms
Training Time: 131 ms
DEBUG:ai_benchmark:Training Time: 131 ms
Training Time: 131 ms
DEBUG:ai_benchmark:Training Time: 131 ms
Training Time: 131 ms
DEBUG:ai_benchmark:Training Time: 131 ms
Training Time: 130 ms
DEBUG:ai_benchmark:Training Time: 130 ms
Training Time: 131 ms
DEBUG:ai_benchmark:Training Time: 131 ms
Training Time: 129 ms
DEBUG:ai_benchmark:Training Time: 129 ms
Training Time: 130 ms
DEBUG:ai_benchmark:Training Time: 130 ms
Training Time: 129 ms
DEBUG:ai_benchmark:Training Time: 129 ms
Training Time: 131 ms
DEBUG:ai_benchmark:Training Time: 131 ms
Training Time: 129 ms
DEBUG:ai_benchmark:Training Time: 129 ms
Training Time: 130 ms
DEBUG:ai_benchmark:Training Time: 130 ms
Training Time: 129 ms
DEBUG:ai_benchmark:Training Time: 129 ms
Training Time: 131 ms
DEBUG:ai_benchmark:Training Time: 131 ms
Training Time: 130 ms
DEBUG:ai_benchmark:Training Time: 130 ms
Training Time: 132 ms
DEBUG:ai_benchmark:Training Time: 132 ms
Training Time: 129 ms
DEBUG:ai_benchmark:Training Time: 129 ms
Training Time: 130 ms
DEBUG:ai_benchmark:Training Time: 130 ms
Training Time: 129 ms
DEBUG:ai_benchmark:Training Time: 129 ms
Training Time: 139 ms
DEBUG:ai_benchmark:Training Time: 139 ms
10.3 - training | batch=5, size=512x512: 131 ± 2 ms
INFO:ai_benchmark:10.3 - training | batch=5, size=512x512: 131 ± 2 ms
11/19. ResNet-DPED
INFO:ai_benchmark:
11/19. ResNet-DPED
2023-09-04 15:42:50.745526: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:50.745647: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:50.745686: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:50.745751: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:50.745792: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:42:50.745816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:42:50.827978: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:42:51.025158: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 482 ms
DEBUG:ai_benchmark:Inference Time: 482 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 50 ms
DEBUG:ai_benchmark:Inference Time: 50 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 50 ms
DEBUG:ai_benchmark:Inference Time: 50 ms
Inference Time: 50 ms
DEBUG:ai_benchmark:Inference Time: 50 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 53 ms
DEBUG:ai_benchmark:Inference Time: 53 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 50 ms
DEBUG:ai_benchmark:Inference Time: 50 ms
Inference Time: 53 ms
DEBUG:ai_benchmark:Inference Time: 53 ms
Inference Time: 47 ms
DEBUG:ai_benchmark:Inference Time: 47 ms
Inference Time: 50 ms
DEBUG:ai_benchmark:Inference Time: 50 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 47 ms
DEBUG:ai_benchmark:Inference Time: 47 ms
11.1 - inference | batch=10, size=256x256: 49.1 ± 1.6 ms
INFO:ai_benchmark:11.1 - inference | batch=10, size=256x256: 49.1 ± 1.6 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 82 ms
DEBUG:ai_benchmark:Inference Time: 82 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 78 ms
DEBUG:ai_benchmark:Inference Time: 78 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 78 ms
DEBUG:ai_benchmark:Inference Time: 78 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 78 ms
DEBUG:ai_benchmark:Inference Time: 78 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
11.2 - inference | batch=1, size=1024x1024: 79.2 ± 0.9 ms
INFO:ai_benchmark:11.2 - inference | batch=1, size=1024x1024: 79.2 ± 0.9 ms
2023-09-04 15:42:58.036611: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:42:58.930243: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 810 ms
DEBUG:ai_benchmark:Training Time: 810 ms
Training Time: 107 ms
DEBUG:ai_benchmark:Training Time: 107 ms
Training Time: 107 ms
DEBUG:ai_benchmark:Training Time: 107 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
Training Time: 108 ms
DEBUG:ai_benchmark:Training Time: 108 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
Training Time: 108 ms
DEBUG:ai_benchmark:Training Time: 108 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
Training Time: 110 ms
DEBUG:ai_benchmark:Training Time: 110 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
Training Time: 108 ms
DEBUG:ai_benchmark:Training Time: 108 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
Training Time: 109 ms
DEBUG:ai_benchmark:Training Time: 109 ms
11.3 - training | batch=15, size=128x128: 108.7 ± 0.7 ms
INFO:ai_benchmark:11.3 - training | batch=15, size=128x128: 108.7 ± 0.7 ms
12/19. U-Net
INFO:ai_benchmark:
12/19. U-Net
2023-09-04 15:43:12.089042: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:43:12.089159: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:43:12.089209: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:43:12.089286: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:43:12.089341: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:43:12.089373: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:43:12.281416: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:43:12.466331: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 8843 ms
DEBUG:ai_benchmark:Inference Time: 8843 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 79 ms
DEBUG:ai_benchmark:Inference Time: 79 ms
12.1 - inference | batch=4, size=512x512: 80.0 ± 0.7 ms
INFO:ai_benchmark:12.1 - inference | batch=4, size=512x512: 80.0 ± 0.7 ms
Inference Time: 9645 ms
DEBUG:ai_benchmark:Inference Time: 9645 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 82 ms
DEBUG:ai_benchmark:Inference Time: 82 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 82 ms
DEBUG:ai_benchmark:Inference Time: 82 ms
Inference Time: 82 ms
DEBUG:ai_benchmark:Inference Time: 82 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 82 ms
DEBUG:ai_benchmark:Inference Time: 82 ms
Inference Time: 82 ms
DEBUG:ai_benchmark:Inference Time: 82 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 80 ms
DEBUG:ai_benchmark:Inference Time: 80 ms
Inference Time: 82 ms
DEBUG:ai_benchmark:Inference Time: 82 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
Inference Time: 81 ms
DEBUG:ai_benchmark:Inference Time: 81 ms
12.2 - inference | batch=1, size=1024x1024: 81.2 ± 0.6 ms
INFO:ai_benchmark:12.2 - inference | batch=1, size=1024x1024: 81.2 ± 0.6 ms
2023-09-04 15:43:37.220648: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:43:38.261203: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 7317 ms
DEBUG:ai_benchmark:Training Time: 7317 ms
Training Time: 127 ms
DEBUG:ai_benchmark:Training Time: 127 ms
Training Time: 126 ms
DEBUG:ai_benchmark:Training Time: 126 ms
Training Time: 126 ms
DEBUG:ai_benchmark:Training Time: 126 ms
Training Time: 125 ms
DEBUG:ai_benchmark:Training Time: 125 ms
Training Time: 125 ms
DEBUG:ai_benchmark:Training Time: 125 ms
Training Time: 126 ms
DEBUG:ai_benchmark:Training Time: 126 ms
Training Time: 125 ms
DEBUG:ai_benchmark:Training Time: 125 ms
Training Time: 125 ms
DEBUG:ai_benchmark:Training Time: 125 ms
Training Time: 127 ms
DEBUG:ai_benchmark:Training Time: 127 ms
Training Time: 127 ms
DEBUG:ai_benchmark:Training Time: 127 ms
Training Time: 126 ms
DEBUG:ai_benchmark:Training Time: 126 ms
Training Time: 126 ms
DEBUG:ai_benchmark:Training Time: 126 ms
Training Time: 126 ms
DEBUG:ai_benchmark:Training Time: 126 ms
Training Time: 126 ms
DEBUG:ai_benchmark:Training Time: 126 ms
Training Time: 126 ms
DEBUG:ai_benchmark:Training Time: 126 ms
Training Time: 125 ms
DEBUG:ai_benchmark:Training Time: 125 ms
Training Time: 125 ms
DEBUG:ai_benchmark:Training Time: 125 ms
Training Time: 126 ms
DEBUG:ai_benchmark:Training Time: 126 ms
Training Time: 126 ms
DEBUG:ai_benchmark:Training Time: 126 ms
Training Time: 125 ms
DEBUG:ai_benchmark:Training Time: 125 ms
Training Time: 126 ms
DEBUG:ai_benchmark:Training Time: 126 ms
12.3 - training | batch=4, size=256x256: 125.8 ± 0.7 ms
INFO:ai_benchmark:12.3 - training | batch=4, size=256x256: 125.8 ± 0.7 ms
13/19. Nvidia-SPADE
INFO:ai_benchmark:
13/19. Nvidia-SPADE
2023-09-04 15:43:48.612494: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:43:48.612610: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:43:48.612662: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:43:48.612738: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:43:48.612798: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:43:48.612836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:43:49.097222: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:43:49.470627: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 2933 ms
DEBUG:ai_benchmark:Inference Time: 2933 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
Inference Time: 48 ms
DEBUG:ai_benchmark:Inference Time: 48 ms
Inference Time: 49 ms
DEBUG:ai_benchmark:Inference Time: 49 ms
13.1 - inference | batch=5, size=128x128: 48.4 ± 0.5 ms
INFO:ai_benchmark:13.1 - inference | batch=5, size=128x128: 48.4 ± 0.5 ms
2023-09-04 15:43:56.321476: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:43:58.402851: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 4250 ms
DEBUG:ai_benchmark:Training Time: 4250 ms
Training Time: 83 ms
DEBUG:ai_benchmark:Training Time: 83 ms
Training Time: 83 ms
DEBUG:ai_benchmark:Training Time: 83 ms
Training Time: 83 ms
DEBUG:ai_benchmark:Training Time: 83 ms
Training Time: 83 ms
DEBUG:ai_benchmark:Training Time: 83 ms
Training Time: 83 ms
DEBUG:ai_benchmark:Training Time: 83 ms
Training Time: 82 ms
DEBUG:ai_benchmark:Training Time: 82 ms
Training Time: 82 ms
DEBUG:ai_benchmark:Training Time: 82 ms
Training Time: 82 ms
DEBUG:ai_benchmark:Training Time: 82 ms
Training Time: 84 ms
DEBUG:ai_benchmark:Training Time: 84 ms
Training Time: 82 ms
DEBUG:ai_benchmark:Training Time: 82 ms
Training Time: 82 ms
DEBUG:ai_benchmark:Training Time: 82 ms
Training Time: 83 ms
DEBUG:ai_benchmark:Training Time: 83 ms
Training Time: 82 ms
DEBUG:ai_benchmark:Training Time: 82 ms
Training Time: 82 ms
DEBUG:ai_benchmark:Training Time: 82 ms
Training Time: 82 ms
DEBUG:ai_benchmark:Training Time: 82 ms
Training Time: 83 ms
DEBUG:ai_benchmark:Training Time: 83 ms
Training Time: 83 ms
DEBUG:ai_benchmark:Training Time: 83 ms
Training Time: 82 ms
DEBUG:ai_benchmark:Training Time: 82 ms
Training Time: 82 ms
DEBUG:ai_benchmark:Training Time: 82 ms
Training Time: 82 ms
DEBUG:ai_benchmark:Training Time: 82 ms
Training Time: 82 ms
DEBUG:ai_benchmark:Training Time: 82 ms
13.2 - training | batch=1, size=128x128: 82.5 ± 0.6 ms
INFO:ai_benchmark:13.2 - training | batch=1, size=128x128: 82.5 ± 0.6 ms
14/19. ICNet
INFO:ai_benchmark:
14/19. ICNet
2023-09-04 15:44:03.065577: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:44:03.065709: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:44:03.065755: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:44:03.065833: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:44:03.065882: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:44:03.065918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:44:03.505339: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:44:03.897380: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 3419 ms
DEBUG:ai_benchmark:Inference Time: 3419 ms
Inference Time: 104 ms
DEBUG:ai_benchmark:Inference Time: 104 ms
Inference Time: 105 ms
DEBUG:ai_benchmark:Inference Time: 105 ms
Inference Time: 106 ms
DEBUG:ai_benchmark:Inference Time: 106 ms
Inference Time: 111 ms
DEBUG:ai_benchmark:Inference Time: 111 ms
Inference Time: 107 ms
DEBUG:ai_benchmark:Inference Time: 107 ms
Inference Time: 110 ms
DEBUG:ai_benchmark:Inference Time: 110 ms
Inference Time: 114 ms
DEBUG:ai_benchmark:Inference Time: 114 ms
Inference Time: 108 ms
DEBUG:ai_benchmark:Inference Time: 108 ms
Inference Time: 108 ms
DEBUG:ai_benchmark:Inference Time: 108 ms
Inference Time: 110 ms
DEBUG:ai_benchmark:Inference Time: 110 ms
Inference Time: 108 ms
DEBUG:ai_benchmark:Inference Time: 108 ms
Inference Time: 110 ms
DEBUG:ai_benchmark:Inference Time: 110 ms
Inference Time: 109 ms
DEBUG:ai_benchmark:Inference Time: 109 ms
Inference Time: 109 ms
DEBUG:ai_benchmark:Inference Time: 109 ms
Inference Time: 111 ms
DEBUG:ai_benchmark:Inference Time: 111 ms
Inference Time: 108 ms
DEBUG:ai_benchmark:Inference Time: 108 ms
Inference Time: 111 ms
DEBUG:ai_benchmark:Inference Time: 111 ms
Inference Time: 109 ms
DEBUG:ai_benchmark:Inference Time: 109 ms
Inference Time: 111 ms
DEBUG:ai_benchmark:Inference Time: 111 ms
Inference Time: 106 ms
DEBUG:ai_benchmark:Inference Time: 106 ms
Inference Time: 110 ms
DEBUG:ai_benchmark:Inference Time: 110 ms
14.1 - inference | batch=5, size=1024x1536: 109 ± 2 ms
INFO:ai_benchmark:14.1 - inference | batch=5, size=1024x1536: 109 ± 2 ms
2023-09-04 15:44:12.763925: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:44:13.750116: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 5732 ms
DEBUG:ai_benchmark:Training Time: 5732 ms
Training Time: 851 ms
DEBUG:ai_benchmark:Training Time: 851 ms
Training Time: 857 ms
DEBUG:ai_benchmark:Training Time: 857 ms
Training Time: 858 ms
DEBUG:ai_benchmark:Training Time: 858 ms
Training Time: 853 ms
DEBUG:ai_benchmark:Training Time: 853 ms
Training Time: 856 ms
DEBUG:ai_benchmark:Training Time: 856 ms
Training Time: 862 ms
DEBUG:ai_benchmark:Training Time: 862 ms
Training Time: 854 ms
DEBUG:ai_benchmark:Training Time: 854 ms
Training Time: 852 ms
DEBUG:ai_benchmark:Training Time: 852 ms
Training Time: 855 ms
DEBUG:ai_benchmark:Training Time: 855 ms
Training Time: 853 ms
DEBUG:ai_benchmark:Training Time: 853 ms
Training Time: 855 ms
DEBUG:ai_benchmark:Training Time: 855 ms
Training Time: 859 ms
DEBUG:ai_benchmark:Training Time: 859 ms
Training Time: 852 ms
DEBUG:ai_benchmark:Training Time: 852 ms
Training Time: 857 ms
DEBUG:ai_benchmark:Training Time: 857 ms
Training Time: 852 ms
DEBUG:ai_benchmark:Training Time: 852 ms
Training Time: 861 ms
DEBUG:ai_benchmark:Training Time: 861 ms
14.2 - training | batch=10, size=1024x1536: 855 ± 3 ms
INFO:ai_benchmark:14.2 - training | batch=10, size=1024x1536: 855 ± 3 ms
15/19. PSPNet
INFO:ai_benchmark:
15/19. PSPNet
2023-09-04 15:44:42.700521: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:44:42.700636: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:44:42.700673: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:44:42.700736: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:44:42.700776: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:44:42.700795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:44:43.308172: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:44:43.714808: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 11204 ms
DEBUG:ai_benchmark:Inference Time: 11204 ms
Inference Time: 243 ms
DEBUG:ai_benchmark:Inference Time: 243 ms
Inference Time: 246 ms
DEBUG:ai_benchmark:Inference Time: 246 ms
Inference Time: 244 ms
DEBUG:ai_benchmark:Inference Time: 244 ms
Inference Time: 246 ms
DEBUG:ai_benchmark:Inference Time: 246 ms
Inference Time: 246 ms
DEBUG:ai_benchmark:Inference Time: 246 ms
Inference Time: 250 ms
DEBUG:ai_benchmark:Inference Time: 250 ms
Inference Time: 251 ms
DEBUG:ai_benchmark:Inference Time: 251 ms
Inference Time: 242 ms
DEBUG:ai_benchmark:Inference Time: 242 ms
Inference Time: 244 ms
DEBUG:ai_benchmark:Inference Time: 244 ms
Inference Time: 242 ms
DEBUG:ai_benchmark:Inference Time: 242 ms
Inference Time: 247 ms
DEBUG:ai_benchmark:Inference Time: 247 ms
Inference Time: 240 ms
DEBUG:ai_benchmark:Inference Time: 240 ms
Inference Time: 241 ms
DEBUG:ai_benchmark:Inference Time: 241 ms
Inference Time: 241 ms
DEBUG:ai_benchmark:Inference Time: 241 ms
Inference Time: 241 ms
DEBUG:ai_benchmark:Inference Time: 241 ms
Inference Time: 241 ms
DEBUG:ai_benchmark:Inference Time: 241 ms
Inference Time: 240 ms
DEBUG:ai_benchmark:Inference Time: 240 ms
Inference Time: 245 ms
DEBUG:ai_benchmark:Inference Time: 245 ms
Inference Time: 243 ms
DEBUG:ai_benchmark:Inference Time: 243 ms
Inference Time: 245 ms
DEBUG:ai_benchmark:Inference Time: 245 ms
Inference Time: 246 ms
DEBUG:ai_benchmark:Inference Time: 246 ms
15.1 - inference | batch=5, size=720x720: 244 ± 3 ms
INFO:ai_benchmark:15.1 - inference | batch=5, size=720x720: 244 ± 3 ms
2023-09-04 15:45:03.100497: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:45:04.068987: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 7152 ms
DEBUG:ai_benchmark:Training Time: 7152 ms
Training Time: 157 ms
DEBUG:ai_benchmark:Training Time: 157 ms
Training Time: 154 ms
DEBUG:ai_benchmark:Training Time: 154 ms
Training Time: 153 ms
DEBUG:ai_benchmark:Training Time: 153 ms
Training Time: 152 ms
DEBUG:ai_benchmark:Training Time: 152 ms
Training Time: 153 ms
DEBUG:ai_benchmark:Training Time: 153 ms
Training Time: 152 ms
DEBUG:ai_benchmark:Training Time: 152 ms
Training Time: 152 ms
DEBUG:ai_benchmark:Training Time: 152 ms
Training Time: 152 ms
DEBUG:ai_benchmark:Training Time: 152 ms
Training Time: 152 ms
DEBUG:ai_benchmark:Training Time: 152 ms
Training Time: 151 ms
DEBUG:ai_benchmark:Training Time: 151 ms
Training Time: 151 ms
DEBUG:ai_benchmark:Training Time: 151 ms
Training Time: 153 ms
DEBUG:ai_benchmark:Training Time: 153 ms
Training Time: 152 ms
DEBUG:ai_benchmark:Training Time: 152 ms
Training Time: 151 ms
DEBUG:ai_benchmark:Training Time: 151 ms
Training Time: 152 ms
DEBUG:ai_benchmark:Training Time: 152 ms
Training Time: 152 ms
DEBUG:ai_benchmark:Training Time: 152 ms
Training Time: 152 ms
DEBUG:ai_benchmark:Training Time: 152 ms
Training Time: 151 ms
DEBUG:ai_benchmark:Training Time: 151 ms
Training Time: 152 ms
DEBUG:ai_benchmark:Training Time: 152 ms
Training Time: 150 ms
DEBUG:ai_benchmark:Training Time: 150 ms
Training Time: 152 ms
DEBUG:ai_benchmark:Training Time: 152 ms
15.2 - training | batch=1, size=512x512: 152 ± 1 ms
INFO:ai_benchmark:15.2 - training | batch=1, size=512x512: 152 ± 1 ms
16/19. DeepLab
INFO:ai_benchmark:
16/19. DeepLab
2023-09-04 15:45:14.005935: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:14.006056: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:14.006095: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:14.006159: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:14.006202: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:14.006226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:45:15.220997: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:45:15.971717: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 2029 ms
DEBUG:ai_benchmark:Inference Time: 2029 ms
Inference Time: 63 ms
DEBUG:ai_benchmark:Inference Time: 63 ms
Inference Time: 63 ms
DEBUG:ai_benchmark:Inference Time: 63 ms
Inference Time: 63 ms
DEBUG:ai_benchmark:Inference Time: 63 ms
Inference Time: 62 ms
DEBUG:ai_benchmark:Inference Time: 62 ms
Inference Time: 62 ms
DEBUG:ai_benchmark:Inference Time: 62 ms
Inference Time: 63 ms
DEBUG:ai_benchmark:Inference Time: 63 ms
Inference Time: 62 ms
DEBUG:ai_benchmark:Inference Time: 62 ms
Inference Time: 61 ms
DEBUG:ai_benchmark:Inference Time: 61 ms
Inference Time: 62 ms
DEBUG:ai_benchmark:Inference Time: 62 ms
Inference Time: 62 ms
DEBUG:ai_benchmark:Inference Time: 62 ms
Inference Time: 62 ms
DEBUG:ai_benchmark:Inference Time: 62 ms
Inference Time: 61 ms
DEBUG:ai_benchmark:Inference Time: 61 ms
Inference Time: 61 ms
DEBUG:ai_benchmark:Inference Time: 61 ms
Inference Time: 62 ms
DEBUG:ai_benchmark:Inference Time: 62 ms
Inference Time: 61 ms
DEBUG:ai_benchmark:Inference Time: 61 ms
Inference Time: 61 ms
DEBUG:ai_benchmark:Inference Time: 61 ms
Inference Time: 61 ms
DEBUG:ai_benchmark:Inference Time: 61 ms
Inference Time: 62 ms
DEBUG:ai_benchmark:Inference Time: 62 ms
Inference Time: 62 ms
DEBUG:ai_benchmark:Inference Time: 62 ms
Inference Time: 61 ms
DEBUG:ai_benchmark:Inference Time: 61 ms
Inference Time: 62 ms
DEBUG:ai_benchmark:Inference Time: 62 ms
16.1 - inference | batch=2, size=512x512: 61.9 ± 0.7 ms
INFO:ai_benchmark:16.1 - inference | batch=2, size=512x512: 61.9 ± 0.7 ms
2023-09-04 15:45:22.157052: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:45:24.000836: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 3498 ms
DEBUG:ai_benchmark:Training Time: 3498 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
Training Time: 93 ms
DEBUG:ai_benchmark:Training Time: 93 ms
Training Time: 93 ms
DEBUG:ai_benchmark:Training Time: 93 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
Training Time: 93 ms
DEBUG:ai_benchmark:Training Time: 93 ms
Training Time: 93 ms
DEBUG:ai_benchmark:Training Time: 93 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
Training Time: 93 ms
DEBUG:ai_benchmark:Training Time: 93 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
Training Time: 93 ms
DEBUG:ai_benchmark:Training Time: 93 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
Training Time: 94 ms
DEBUG:ai_benchmark:Training Time: 94 ms
16.2 - training | batch=1, size=384x384: 93.7 ± 0.5 ms
INFO:ai_benchmark:16.2 - training | batch=1, size=384x384: 93.7 ± 0.5 ms
17/19. Pixel-RNN
INFO:ai_benchmark:
17/19. Pixel-RNN
2023-09-04 15:45:28.181679: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:28.181795: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:28.181827: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:28.181885: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:28.181922: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:28.181941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:45:28.414678: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:28.414806: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:28.414848: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:28.414922: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:28.414987: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:45:28.415011: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:45:45.467070: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:45:45.484360: W tensorflow/c/c_api.cc:305] Operation '{name:'conv2d_out_logits/biases/Adam_1/Assign' id:47369 op device:{requested: '', assigned: ''} def:{{{node conv2d_out_logits/biases/Adam_1/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](conv2d_out_logits/biases/Adam_1, conv2d_out_logits/biases/Adam_1/Initializer/zeros)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2023-09-04 15:45:47.055800: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:45:50.984216: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 29323 ms
DEBUG:ai_benchmark:Inference Time: 29323 ms
Inference Time: 448 ms
DEBUG:ai_benchmark:Inference Time: 448 ms
Inference Time: 455 ms
DEBUG:ai_benchmark:Inference Time: 455 ms
Inference Time: 465 ms
DEBUG:ai_benchmark:Inference Time: 465 ms
Inference Time: 463 ms
DEBUG:ai_benchmark:Inference Time: 463 ms
17.1 - inference | batch=50, size=64x64: 458 ± 7 ms
INFO:ai_benchmark:17.1 - inference | batch=50, size=64x64: 458 ± 7 ms
2023-09-04 15:46:38.798519: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 22011 ms
DEBUG:ai_benchmark:Training Time: 22011 ms
Training Time: 3473 ms
DEBUG:ai_benchmark:Training Time: 3473 ms
Training Time: 3541 ms
DEBUG:ai_benchmark:Training Time: 3541 ms
Training Time: 3514 ms
DEBUG:ai_benchmark:Training Time: 3514 ms
Training Time: 3435 ms
DEBUG:ai_benchmark:Training Time: 3435 ms
17.2 - training | batch=10, size=64x64: 3491 ± 40 ms
INFO:ai_benchmark:17.2 - training | batch=10, size=64x64: 3491 ± 40 ms
18/19. LSTM-Sentiment
INFO:ai_benchmark:
18/19. LSTM-Sentiment
2023-09-04 15:46:58.705347: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:46:58.705467: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:46:58.705507: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:46:58.705567: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:46:58.705607: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:46:58.705636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:46:58.975533: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:46:58.986090: W tensorflow/c/c_api.cc:305] Operation '{name:'Variable_1/Adam_1/Assign' id:351 op device:{requested: '', assigned: ''} def:{{{node Variable_1/Adam_1/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](Variable_1/Adam_1, Variable_1/Adam_1/Initializer/zeros)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2023-09-04 15:46:59.028116: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:46:59.363607: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 875 ms
DEBUG:ai_benchmark:Inference Time: 875 ms
Inference Time: 591 ms
DEBUG:ai_benchmark:Inference Time: 591 ms
Inference Time: 579 ms
DEBUG:ai_benchmark:Inference Time: 579 ms
Inference Time: 610 ms
DEBUG:ai_benchmark:Inference Time: 610 ms
Inference Time: 601 ms
DEBUG:ai_benchmark:Inference Time: 601 ms
Inference Time: 587 ms
DEBUG:ai_benchmark:Inference Time: 587 ms
Inference Time: 593 ms
DEBUG:ai_benchmark:Inference Time: 593 ms
Inference Time: 583 ms
DEBUG:ai_benchmark:Inference Time: 583 ms
Inference Time: 564 ms
DEBUG:ai_benchmark:Inference Time: 564 ms
Inference Time: 590 ms
DEBUG:ai_benchmark:Inference Time: 590 ms
Inference Time: 591 ms
DEBUG:ai_benchmark:Inference Time: 591 ms
Inference Time: 580 ms
DEBUG:ai_benchmark:Inference Time: 580 ms
Inference Time: 604 ms
DEBUG:ai_benchmark:Inference Time: 604 ms
Inference Time: 586 ms
DEBUG:ai_benchmark:Inference Time: 586 ms
Inference Time: 597 ms
DEBUG:ai_benchmark:Inference Time: 597 ms
Inference Time: 573 ms
DEBUG:ai_benchmark:Inference Time: 573 ms
Inference Time: 584 ms
DEBUG:ai_benchmark:Inference Time: 584 ms
Inference Time: 583 ms
DEBUG:ai_benchmark:Inference Time: 583 ms
Inference Time: 586 ms
DEBUG:ai_benchmark:Inference Time: 586 ms
Inference Time: 605 ms
DEBUG:ai_benchmark:Inference Time: 605 ms
Inference Time: 583 ms
DEBUG:ai_benchmark:Inference Time: 583 ms
Inference Time: 600 ms
DEBUG:ai_benchmark:Inference Time: 600 ms
18.1 - inference | batch=100, size=1024x300: 589 ± 11 ms
INFO:ai_benchmark:18.1 - inference | batch=100, size=1024x300: 589 ± 11 ms
2023-09-04 15:47:16.964090: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 1984 ms
DEBUG:ai_benchmark:Training Time: 1984 ms
Training Time: 1732 ms
DEBUG:ai_benchmark:Training Time: 1732 ms
Training Time: 1732 ms
DEBUG:ai_benchmark:Training Time: 1732 ms
Training Time: 1663 ms
DEBUG:ai_benchmark:Training Time: 1663 ms
Training Time: 1711 ms
DEBUG:ai_benchmark:Training Time: 1711 ms
Training Time: 1747 ms
DEBUG:ai_benchmark:Training Time: 1747 ms
Training Time: 1798 ms
DEBUG:ai_benchmark:Training Time: 1798 ms
Training Time: 1704 ms
DEBUG:ai_benchmark:Training Time: 1704 ms
Training Time: 1774 ms
DEBUG:ai_benchmark:Training Time: 1774 ms
Training Time: 1747 ms
DEBUG:ai_benchmark:Training Time: 1747 ms
Training Time: 1779 ms
DEBUG:ai_benchmark:Training Time: 1779 ms
Training Time: 1833 ms
DEBUG:ai_benchmark:Training Time: 1833 ms
Training Time: 1738 ms
DEBUG:ai_benchmark:Training Time: 1738 ms
Training Time: 2088 ms
DEBUG:ai_benchmark:Training Time: 2088 ms
Training Time: 1736 ms
DEBUG:ai_benchmark:Training Time: 1736 ms
Training Time: 1783 ms
DEBUG:ai_benchmark:Training Time: 1783 ms
Training Time: 1706 ms
DEBUG:ai_benchmark:Training Time: 1706 ms
18.2 - training | batch=10, size=1024x300: 1767 ± 92 ms
INFO:ai_benchmark:18.2 - training | batch=10, size=1024x300: 1767 ± 92 ms
19/19. GNMT-Translation
INFO:ai_benchmark:
19/19. GNMT-Translation
2023-09-04 15:47:47.390113: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:47:47.390226: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:47:47.390265: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:47:47.390327: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:47:47.390373: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-04 15:47:47.390397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:01:00.0
2023-09-04 15:47:47.662655: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:47:47.677094: W tensorflow/c/c_api.cc:305] Operation '{name:'index_to_string/table_init' id:13 op device:{requested: '', assigned: ''} def:{{{node index_to_string/table_init}} = InitializeTableFromTextFileV2[_has_manual_control_dependencies=true, delimiter="\t", key_index=-1, offset=0, value_index=-2, vocab_size=-1](index_to_string, index_to_string/table_init/asset_filepath)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2023-09-04 15:47:47.709007: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-04 15:47:47.918263: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 1030 ms
DEBUG:ai_benchmark:Inference Time: 1030 ms
Inference Time: 186 ms
DEBUG:ai_benchmark:Inference Time: 186 ms
Inference Time: 187 ms
DEBUG:ai_benchmark:Inference Time: 187 ms
Inference Time: 187 ms
DEBUG:ai_benchmark:Inference Time: 187 ms
Inference Time: 185 ms
DEBUG:ai_benchmark:Inference Time: 185 ms
Inference Time: 184 ms
DEBUG:ai_benchmark:Inference Time: 184 ms
Inference Time: 184 ms
DEBUG:ai_benchmark:Inference Time: 184 ms
Inference Time: 184 ms
DEBUG:ai_benchmark:Inference Time: 184 ms
Inference Time: 183 ms
DEBUG:ai_benchmark:Inference Time: 183 ms
Inference Time: 184 ms
DEBUG:ai_benchmark:Inference Time: 184 ms
Inference Time: 185 ms
DEBUG:ai_benchmark:Inference Time: 185 ms
Inference Time: 187 ms
DEBUG:ai_benchmark:Inference Time: 187 ms
Inference Time: 186 ms
DEBUG:ai_benchmark:Inference Time: 186 ms
Inference Time: 185 ms
DEBUG:ai_benchmark:Inference Time: 185 ms
Inference Time: 186 ms
DEBUG:ai_benchmark:Inference Time: 186 ms
Inference Time: 186 ms
DEBUG:ai_benchmark:Inference Time: 186 ms
Inference Time: 185 ms
DEBUG:ai_benchmark:Inference Time: 185 ms
Inference Time: 188 ms
DEBUG:ai_benchmark:Inference Time: 188 ms
Inference Time: 186 ms
DEBUG:ai_benchmark:Inference Time: 186 ms
Inference Time: 184 ms
DEBUG:ai_benchmark:Inference Time: 184 ms
Inference Time: 185 ms
DEBUG:ai_benchmark:Inference Time: 185 ms
Inference Time: 186 ms
DEBUG:ai_benchmark:Inference Time: 186 ms
19.1 - inference | batch=1, size=1x20: 185 ± 1 ms
INFO:ai_benchmark:19.1 - inference | batch=1, size=1x20: 185 ± 1 ms
Device Inference Score: 21950
INFO:ai_benchmark:Device Inference Score: 21950
Device Training Score: 12434
INFO:ai_benchmark:Device Training Score: 12434
Device AI Score: 34384
INFO:ai_benchmark:Device AI Score: 34384
For more information and results, please visit http://ai-benchmark.com/alpha
INFO:ai_benchmark:For more information and results, please visit http://ai-benchmark.com/alpha
(tf) root@rocm:~/tmp# export ROCM_PATH=/opt/rocm
(tf) root@rocm:~/tmp# python bench.py
2023-09-16 01:08:01.778483: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-16 01:08:02.688377: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:08:02.699485: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:08:02.699534: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:08:02.713262: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:08:02.713334: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:08:02.713362: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:08:02.713692: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:08:02.713727: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:08:02.713764: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:08:02.713779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:08:02.913471: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.915442: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.916020: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.919301: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.919896: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.960290: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.962323: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.964550: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.965338: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.966102: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.966905: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.967434: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.968667: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.971729: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.972210: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.978303: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.978835: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.979620: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.983667: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.984152: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.984875: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.985356: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.989608: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.990085: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.990807: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.991288: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.994083: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.994899: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.995378: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.997529: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:02.998368: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:04.913736: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 2400000000 exceeds 10% of free system memory.
2023-09-16 01:08:05.553178: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:05.995288: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:05.995513: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 2400000000 exceeds 10% of free system memory.
2023-09-16 01:08:06.578404: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:06.579381: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:06.580661: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:06.585309: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:06.586129: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:06.586999: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:06.588190: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:06.590628: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:06.591626: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:06.592076: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:06.592347: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 2400000000 exceeds 10% of free system memory.
Epoch 1/2
2023-09-16 01:08:07.179126: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.179954: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.180669: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.181406: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.182121: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.182834: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.183548: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.184262: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.184970: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.185706: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.186421: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.187131: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.187840: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.188569: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.189275: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.190003: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:07.438768: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:08.420273: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f7aaa242dd0 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices:
2023-09-16 01:08:08.420307: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Radeon RX 7900 XTX, AMDGPU ISA version: gfx1100
2023-09-16 01:08:08.423665: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-09-16 01:08:08.485395: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
20/20 [==============================] - ETA: 0s - loss: 1.13352023-09-16 01:08:25.935172: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:25.935742: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:08:25.936198: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
20/20 [==============================] - 19s 828ms/step - loss: 1.1335
2023-09-16 01:08:25.938763: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Epoch 2/2
20/20 [==============================] - 17s 834ms/step - loss: 1.1179
(tf) root@rocm:~/tmp# export ROCM_PATH=/opt/rocm^C
(tf) root@rocm:~/tmp# nano ~/.bashrc
(tf) root@rocm:~/tmp# python benchmark.py
2023-09-16 01:09:24.012526: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-16 01:09:24.920123: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:24.931203: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:24.931251: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
>> AI-Benchmark - 0.1.3.cm
>> Let the AI Games begin
2023-09-16 01:09:26.107868: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.107982: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.108079: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.108247: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.108306: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.108366: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.108392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:09:26.330448: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.330536: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.330564: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.330611: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.330643: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.330658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:09:26.330705: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.330744: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.330772: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.330806: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.330836: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:26.330848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
* TF Version: 2.15.0
* Platform: Linux-5.15.0-83-generic-x86_64-with-glibc2.35
* CPU: AMD Ryzen 9 3900X 12-Core Processor
* CPU RAM: 47 GB
* GPU/0: Radeon RX 7900 XTX
* GPU RAM: 23.5 GB
* CUDA Version: N/A
* CUDA Build: N/A
The benchmark is running...
The tests might take up to 20 minutes
Please don't interrupt the script
1/19. MobileNet-V2
2023-09-16 01:09:27.599471: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:27.599596: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:27.599652: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:27.599742: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:27.599801: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:27.599829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:09:27.793908: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:386] MLIR V1 optimization pass is not enabled
2023-09-16 01:09:28.010852: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:09:28.553784: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 2345 ms
Inference Time: 31 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 23 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
Inference Time: 22 ms
1.1 - inference | batch=50, size=224x224: 22.5 ± 1.9 ms
2023-09-16 01:09:35.785411: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:09:36.456047: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 2038 ms
Training Time: 363 ms
Training Time: 364 ms
Training Time: 351 ms
Training Time: 360 ms
Training Time: 356 ms
Training Time: 356 ms
Training Time: 352 ms
Training Time: 356 ms
Training Time: 350 ms
Training Time: 354 ms
Training Time: 347 ms
Training Time: 352 ms
Training Time: 352 ms
Training Time: 352 ms
Training Time: 353 ms
Training Time: 355 ms
Training Time: 351 ms
Training Time: 355 ms
Training Time: 354 ms
Training Time: 357 ms
Training Time: 351 ms
1.2 - training | batch=50, size=224x224: 354 ± 4 ms
2/19. Inception-V3
2023-09-16 01:09:48.934252: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:48.934335: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:48.934363: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:48.934415: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:48.934448: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:09:48.934464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:09:49.413365: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:09:49.762318: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 2285 ms
Inference Time: 39 ms
Inference Time: 33 ms
Inference Time: 40 ms
Inference Time: 35 ms
Inference Time: 32 ms
Inference Time: 32 ms
Inference Time: 42 ms
Inference Time: 33 ms
Inference Time: 33 ms
Inference Time: 32 ms
Inference Time: 32 ms
Inference Time: 32 ms
Inference Time: 32 ms
Inference Time: 30 ms
Inference Time: 32 ms
Inference Time: 30 ms
Inference Time: 33 ms
Inference Time: 29 ms
Inference Time: 33 ms
Inference Time: 32 ms
Inference Time: 30 ms
2.1 - inference | batch=20, size=346x346: 33.1 ± 3.2 ms
2023-09-16 01:09:55.682665: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:09:56.634337: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 5225 ms
Training Time: 385 ms
Training Time: 380 ms
Training Time: 376 ms
Training Time: 379 ms
Training Time: 374 ms
Training Time: 377 ms
Training Time: 378 ms
Training Time: 379 ms
Training Time: 379 ms
Training Time: 381 ms
Training Time: 378 ms
Training Time: 381 ms
Training Time: 375 ms
Training Time: 375 ms
Training Time: 378 ms
Training Time: 375 ms
Training Time: 376 ms
Training Time: 376 ms
Training Time: 377 ms
Training Time: 377 ms
Training Time: 392 ms
2.2 - training | batch=20, size=346x346: 378 ± 4 ms
3/19. Inception-V4
2023-09-16 01:10:10.904575: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:10.904662: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:10.904691: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:10.904745: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:10.904778: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:10.904793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:10:11.799543: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:10:12.285430: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 1059 ms
Inference Time: 35 ms
Inference Time: 35 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 35 ms
Inference Time: 36 ms
Inference Time: 35 ms
Inference Time: 36 ms
3.1 - inference | batch=10, size=346x346: 35.5 ± 0.5 ms
2023-09-16 01:10:16.971001: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:10:18.599017: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 3954 ms
Training Time: 306 ms
Training Time: 312 ms
Training Time: 305 ms
Training Time: 306 ms
Training Time: 310 ms
Training Time: 306 ms
Training Time: 306 ms
Training Time: 306 ms
Training Time: 306 ms
Training Time: 309 ms
Training Time: 306 ms
Training Time: 305 ms
Training Time: 307 ms
Training Time: 303 ms
Training Time: 306 ms
Training Time: 308 ms
Training Time: 305 ms
Training Time: 305 ms
Training Time: 303 ms
Training Time: 304 ms
Training Time: 302 ms
3.2 - training | batch=10, size=346x346: 306 ± 2 ms
4/19. Inception-ResNet-V2
2023-09-16 01:10:28.544092: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:28.544284: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:28.544368: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:28.544426: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:28.544461: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:28.544476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:10:30.124825: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:10:30.909275: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 1364 ms
Inference Time: 43 ms
Inference Time: 41 ms
Inference Time: 42 ms
Inference Time: 42 ms
Inference Time: 42 ms
Inference Time: 42 ms
Inference Time: 42 ms
Inference Time: 42 ms
Inference Time: 41 ms
Inference Time: 41 ms
Inference Time: 42 ms
Inference Time: 42 ms
Inference Time: 42 ms
Inference Time: 42 ms
Inference Time: 42 ms
Inference Time: 41 ms
Inference Time: 41 ms
Inference Time: 41 ms
Inference Time: 42 ms
Inference Time: 41 ms
Inference Time: 41 ms
4.1 - inference | batch=10, size=346x346: 41.7 ± 0.6 ms
2023-09-16 01:10:37.645296: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:10:40.838021: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 5193 ms
Training Time: 260 ms
Training Time: 258 ms
Training Time: 259 ms
Training Time: 257 ms
Training Time: 258 ms
Training Time: 258 ms
Training Time: 259 ms
Training Time: 258 ms
Training Time: 256 ms
Training Time: 259 ms
Training Time: 261 ms
Training Time: 257 ms
Training Time: 258 ms
Training Time: 256 ms
Training Time: 260 ms
Training Time: 259 ms
Training Time: 258 ms
Training Time: 257 ms
Training Time: 258 ms
Training Time: 257 ms
Training Time: 256 ms
4.2 - training | batch=8, size=346x346: 258 ± 1 ms
5/19. ResNet-V2-50
2023-09-16 01:10:49.743923: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:49.744131: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:49.744240: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:49.744299: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:49.744332: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:49.744347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:10:50.091722: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:10:50.321624: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 825 ms
Inference Time: 24 ms
Inference Time: 22 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 24 ms
Inference Time: 23 ms
Inference Time: 36 ms
Inference Time: 22 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 22 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 22 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 23 ms
Inference Time: 23 ms
5.1 - inference | batch=10, size=346x346: 23.5 ± 2.8 ms
2023-09-16 01:10:53.627191: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:10:54.287310: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 2376 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 86 ms
Training Time: 86 ms
Training Time: 86 ms
Training Time: 86 ms
Training Time: 86 ms
Training Time: 86 ms
Training Time: 85 ms
Training Time: 86 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 86 ms
Training Time: 84 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 85 ms
5.2 - training | batch=10, size=346x346: 85.3 ± 0.6 ms
6/19. ResNet-V2-152
2023-09-16 01:10:58.943750: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:58.943890: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:58.943952: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:58.944046: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:58.944116: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:10:58.944146: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:11:00.440559: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:11:01.041025: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 980 ms
Inference Time: 32 ms
Inference Time: 32 ms
Inference Time: 33 ms
Inference Time: 33 ms
Inference Time: 31 ms
Inference Time: 32 ms
Inference Time: 32 ms
Inference Time: 33 ms
Inference Time: 31 ms
Inference Time: 32 ms
Inference Time: 32 ms
Inference Time: 32 ms
Inference Time: 31 ms
Inference Time: 32 ms
Inference Time: 31 ms
Inference Time: 32 ms
Inference Time: 32 ms
Inference Time: 33 ms
Inference Time: 31 ms
Inference Time: 32 ms
Inference Time: 31 ms
6.1 - inference | batch=10, size=256x256: 31.9 ± 0.7 ms
2023-09-16 01:11:06.748259: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:11:09.001562: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 3261 ms
Training Time: 110 ms
Training Time: 108 ms
Training Time: 107 ms
Training Time: 109 ms
Training Time: 108 ms
Training Time: 109 ms
Training Time: 107 ms
Training Time: 110 ms
Training Time: 109 ms
Training Time: 108 ms
Training Time: 108 ms
Training Time: 108 ms
Training Time: 121 ms
Training Time: 111 ms
Training Time: 108 ms
Training Time: 108 ms
Training Time: 109 ms
Training Time: 109 ms
Training Time: 111 ms
Training Time: 111 ms
Training Time: 110 ms
6.2 - training | batch=10, size=256x256: 109 ± 3 ms
7/19. VGG-16
2023-09-16 01:11:13.609140: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:13.609657: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:13.609693: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:13.609746: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:13.609780: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:13.609795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:11:13.671676: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:11:13.780649: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 3103 ms
Inference Time: 50 ms
Inference Time: 50 ms
Inference Time: 51 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 50 ms
Inference Time: 47 ms
Inference Time: 50 ms
Inference Time: 49 ms
Inference Time: 49 ms
Inference Time: 50 ms
Inference Time: 50 ms
Inference Time: 49 ms
Inference Time: 49 ms
Inference Time: 50 ms
Inference Time: 49 ms
Inference Time: 50 ms
Inference Time: 50 ms
Inference Time: 49 ms
Inference Time: 49 ms
Inference Time: 51 ms
7.1 - inference | batch=20, size=224x224: 49.4 ± 1.0 ms
2023-09-16 01:11:19.507927: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:11:20.043183: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 2170 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 96 ms
Training Time: 82 ms
Training Time: 83 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 87 ms
Training Time: 88 ms
Training Time: 84 ms
Training Time: 85 ms
Training Time: 85 ms
Training Time: 87 ms
Training Time: 86 ms
Training Time: 85 ms
Training Time: 86 ms
Training Time: 87 ms
Training Time: 89 ms
Training Time: 85 ms
Training Time: 85 ms
7.2 - training | batch=2, size=224x224: 86.0 ± 2.7 ms
8/19. SRCNN 9-5-5
2023-09-16 01:11:24.180499: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:24.180626: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:24.180680: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:24.180764: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:24.180826: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:24.180853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:11:24.207632: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:11:24.401961: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 1680 ms
Inference Time: 30 ms
Inference Time: 30 ms
Inference Time: 28 ms
Inference Time: 28 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 28 ms
Inference Time: 30 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 29 ms
Inference Time: 28 ms
Inference Time: 28 ms
Inference Time: 28 ms
Inference Time: 29 ms
Inference Time: 28 ms
8.1 - inference | batch=10, size=512x512: 28.8 ± 0.7 ms
Inference Time: 5214 ms
Inference Time: 24 ms
Inference Time: 24 ms
Inference Time: 24 ms
Inference Time: 25 ms
Inference Time: 24 ms
Inference Time: 24 ms
Inference Time: 24 ms
Inference Time: 25 ms
Inference Time: 23 ms
Inference Time: 24 ms
Inference Time: 24 ms
Inference Time: 27 ms
Inference Time: 24 ms
Inference Time: 24 ms
Inference Time: 24 ms
Inference Time: 25 ms
Inference Time: 25 ms
Inference Time: 24 ms
Inference Time: 24 ms
Inference Time: 24 ms
Inference Time: 24 ms
8.2 - inference | batch=1, size=1536x1536: 24.3 ± 0.8 ms
2023-09-16 01:11:36.861344: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:11:37.401425: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 7170 ms
Training Time: 191 ms
Training Time: 189 ms
Training Time: 185 ms
Training Time: 187 ms
Training Time: 187 ms
Training Time: 186 ms
Training Time: 182 ms
Training Time: 186 ms
Training Time: 188 ms
Training Time: 184 ms
Training Time: 187 ms
Training Time: 186 ms
Training Time: 188 ms
Training Time: 187 ms
Training Time: 185 ms
Training Time: 186 ms
Training Time: 188 ms
Training Time: 187 ms
Training Time: 188 ms
Training Time: 188 ms
Training Time: 187 ms
8.3 - training | batch=10, size=512x512: 187 ± 2 ms
9/19. VGG-19 Super-Res
2023-09-16 01:11:58.573800: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:58.573879: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:58.573909: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:58.573957: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:58.573990: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:11:58.574004: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:11:58.660685: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:11:58.828911: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 342 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 37 ms
Inference Time: 36 ms
Inference Time: 37 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
Inference Time: 36 ms
9.1 - inference | batch=10, size=256x256: 36.1 ± 0.3 ms
Inference Time: 1210 ms
Inference Time: 58 ms
Inference Time: 58 ms
Inference Time: 58 ms
Inference Time: 58 ms
Inference Time: 57 ms
Inference Time: 58 ms
Inference Time: 58 ms
Inference Time: 58 ms
Inference Time: 58 ms
Inference Time: 58 ms
Inference Time: 57 ms
Inference Time: 58 ms
Inference Time: 58 ms
Inference Time: 57 ms
Inference Time: 58 ms
Inference Time: 58 ms
Inference Time: 58 ms
Inference Time: 58 ms
Inference Time: 58 ms
Inference Time: 58 ms
Inference Time: 58 ms
9.2 - inference | batch=1, size=1024x1024: 57.9 ± 0.3 ms
2023-09-16 01:12:06.219465: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:12:06.718707: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 1261 ms
Training Time: 209 ms
Training Time: 200 ms
Training Time: 202 ms
Training Time: 201 ms
Training Time: 202 ms
Training Time: 202 ms
Training Time: 201 ms
Training Time: 201 ms
Training Time: 201 ms
Training Time: 200 ms
Training Time: 201 ms
Training Time: 201 ms
Training Time: 201 ms
Training Time: 201 ms
Training Time: 201 ms
Training Time: 201 ms
Training Time: 201 ms
Training Time: 202 ms
Training Time: 202 ms
Training Time: 201 ms
Training Time: 201 ms
9.3 - training | batch=10, size=224x224: 202 ± 2 ms
10/19. ResNet-SRGAN
2023-09-16 01:12:19.894846: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:12:19.894926: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:12:19.894955: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:12:19.895008: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:12:19.895041: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:12:19.895056: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:12:20.170133: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:12:20.512697: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 1377 ms
Inference Time: 55 ms
Inference Time: 45 ms
Inference Time: 45 ms
Inference Time: 44 ms
Inference Time: 45 ms
Inference Time: 44 ms
Inference Time: 45 ms
Inference Time: 45 ms
Inference Time: 44 ms
Inference Time: 44 ms
Inference Time: 44 ms
Inference Time: 45 ms
Inference Time: 44 ms
Inference Time: 44 ms
Inference Time: 44 ms
Inference Time: 45 ms
Inference Time: 44 ms
Inference Time: 45 ms
Inference Time: 45 ms
Inference Time: 45 ms
Inference Time: 45 ms
10.1 - inference | batch=10, size=512x512: 45.0 ± 2.3 ms
Inference Time: 1276 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 39 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 39 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 37 ms
Inference Time: 37 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 37 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 38 ms
Inference Time: 37 ms
10.2 - inference | batch=1, size=1536x1536: 37.9 ± 0.5 ms
2023-09-16 01:12:29.654666: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:12:30.354905: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 2828 ms
Training Time: 130 ms
Training Time: 122 ms
Training Time: 120 ms
Training Time: 118 ms
Training Time: 121 ms
Training Time: 118 ms
Training Time: 120 ms
Training Time: 118 ms
Training Time: 122 ms
Training Time: 117 ms
Training Time: 117 ms
Training Time: 118 ms
Training Time: 119 ms
Training Time: 119 ms
Training Time: 118 ms
Training Time: 119 ms
Training Time: 120 ms
Training Time: 119 ms
Training Time: 119 ms
Training Time: 119 ms
Training Time: 120 ms
10.3 - training | batch=5, size=512x512: 120 ± 3 ms
11/19. ResNet-DPED
2023-09-16 01:12:40.172665: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:12:40.172746: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:12:40.172774: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:12:40.172826: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:12:40.172857: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:12:40.172872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:12:40.223224: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:12:40.411515: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 793 ms
Inference Time: 47 ms
Inference Time: 48 ms
Inference Time: 49 ms
Inference Time: 47 ms
Inference Time: 47 ms
Inference Time: 48 ms
Inference Time: 49 ms
Inference Time: 47 ms
Inference Time: 48 ms
Inference Time: 47 ms
Inference Time: 48 ms
Inference Time: 47 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 47 ms
Inference Time: 47 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 48 ms
11.1 - inference | batch=10, size=256x256: 47.7 ± 0.6 ms
Inference Time: 4246 ms
Inference Time: 80 ms
Inference Time: 79 ms
Inference Time: 79 ms
Inference Time: 79 ms
Inference Time: 78 ms
Inference Time: 80 ms
Inference Time: 79 ms
Inference Time: 79 ms
Inference Time: 79 ms
Inference Time: 79 ms
Inference Time: 79 ms
Inference Time: 80 ms
Inference Time: 79 ms
Inference Time: 78 ms
Inference Time: 79 ms
Inference Time: 79 ms
Inference Time: 79 ms
Inference Time: 80 ms
Inference Time: 79 ms
Inference Time: 79 ms
Inference Time: 79 ms
11.2 - inference | batch=1, size=1024x1024: 79.1 ± 0.5 ms
2023-09-16 01:12:52.253098: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:12:53.148495: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 2086 ms
Training Time: 112 ms
Training Time: 111 ms
Training Time: 109 ms
Training Time: 110 ms
Training Time: 110 ms
Training Time: 110 ms
Training Time: 111 ms
Training Time: 110 ms
Training Time: 110 ms
Training Time: 109 ms
Training Time: 110 ms
Training Time: 110 ms
Training Time: 110 ms
Training Time: 110 ms
Training Time: 110 ms
Training Time: 110 ms
Training Time: 110 ms
Training Time: 111 ms
Training Time: 109 ms
Training Time: 111 ms
Training Time: 110 ms
11.3 - training | batch=15, size=128x128: 110.1 ± 0.7 ms
12/19. U-Net
2023-09-16 01:13:08.590230: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:08.590310: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:08.590338: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:08.590393: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:08.590427: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:08.590442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:13:08.716029: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:13:08.878830: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 6404 ms
Inference Time: 80 ms
Inference Time: 80 ms
Inference Time: 79 ms
Inference Time: 80 ms
Inference Time: 80 ms
Inference Time: 80 ms
Inference Time: 80 ms
Inference Time: 79 ms
Inference Time: 78 ms
Inference Time: 80 ms
Inference Time: 80 ms
Inference Time: 79 ms
Inference Time: 80 ms
Inference Time: 79 ms
Inference Time: 81 ms
Inference Time: 79 ms
Inference Time: 80 ms
Inference Time: 80 ms
Inference Time: 80 ms
Inference Time: 80 ms
Inference Time: 81 ms
12.1 - inference | batch=4, size=512x512: 79.8 ± 0.7 ms
Inference Time: 7562 ms
Inference Time: 81 ms
Inference Time: 82 ms
Inference Time: 82 ms
Inference Time: 81 ms
Inference Time: 81 ms
Inference Time: 83 ms
Inference Time: 81 ms
Inference Time: 81 ms
Inference Time: 83 ms
Inference Time: 81 ms
Inference Time: 82 ms
Inference Time: 81 ms
Inference Time: 82 ms
Inference Time: 81 ms
Inference Time: 82 ms
Inference Time: 82 ms
Inference Time: 82 ms
Inference Time: 81 ms
Inference Time: 82 ms
Inference Time: 82 ms
Inference Time: 82 ms
12.2 - inference | batch=1, size=1024x1024: 81.7 ± 0.6 ms
2023-09-16 01:13:28.853492: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:13:29.664292: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 6032 ms
Training Time: 122 ms
Training Time: 120 ms
Training Time: 131 ms
Training Time: 121 ms
Training Time: 121 ms
Training Time: 122 ms
Training Time: 120 ms
Training Time: 120 ms
Training Time: 121 ms
Training Time: 121 ms
Training Time: 122 ms
Training Time: 122 ms
Training Time: 121 ms
Training Time: 121 ms
Training Time: 121 ms
Training Time: 120 ms
Training Time: 121 ms
Training Time: 121 ms
Training Time: 120 ms
Training Time: 121 ms
Training Time: 121 ms
12.3 - training | batch=4, size=256x256: 121 ± 2 ms
13/19. Nvidia-SPADE
2023-09-16 01:13:38.996603: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:38.996736: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:38.996792: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:38.996883: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:38.996951: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:38.996978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:13:39.647058: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:13:39.964882: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 2088 ms
Inference Time: 51 ms
Inference Time: 52 ms
Inference Time: 53 ms
Inference Time: 52 ms
Inference Time: 51 ms
Inference Time: 52 ms
Inference Time: 53 ms
Inference Time: 53 ms
Inference Time: 53 ms
Inference Time: 52 ms
Inference Time: 54 ms
Inference Time: 52 ms
Inference Time: 52 ms
Inference Time: 53 ms
Inference Time: 52 ms
Inference Time: 51 ms
Inference Time: 53 ms
Inference Time: 52 ms
Inference Time: 52 ms
Inference Time: 52 ms
Inference Time: 51 ms
13.1 - inference | batch=5, size=128x128: 52.2 ± 0.8 ms
2023-09-16 01:13:45.599243: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:13:47.300704: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 3643 ms
Training Time: 82 ms
Training Time: 81 ms
Training Time: 82 ms
Training Time: 83 ms
Training Time: 84 ms
Training Time: 81 ms
Training Time: 82 ms
Training Time: 82 ms
Training Time: 82 ms
Training Time: 81 ms
Training Time: 82 ms
Training Time: 82 ms
Training Time: 82 ms
Training Time: 82 ms
Training Time: 82 ms
Training Time: 82 ms
Training Time: 82 ms
Training Time: 83 ms
Training Time: 81 ms
Training Time: 83 ms
Training Time: 82 ms
13.2 - training | batch=1, size=128x128: 82.0 ± 0.7 ms
14/19. ICNet
2023-09-16 01:13:51.935487: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:51.935619: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:51.935675: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:51.935762: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:51.935824: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:13:51.935854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:13:52.244352: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:13:52.589237: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 2384 ms
Inference Time: 112 ms
Inference Time: 118 ms
Inference Time: 120 ms
Inference Time: 118 ms
Inference Time: 115 ms
Inference Time: 116 ms
Inference Time: 116 ms
Inference Time: 112 ms
Inference Time: 116 ms
Inference Time: 117 ms
Inference Time: 119 ms
Inference Time: 117 ms
Inference Time: 118 ms
Inference Time: 117 ms
Inference Time: 118 ms
Inference Time: 119 ms
Inference Time: 119 ms
Inference Time: 115 ms
Inference Time: 118 ms
Inference Time: 117 ms
Inference Time: 117 ms
14.1 - inference | batch=5, size=1024x1536: 117 ± 2 ms
2023-09-16 01:14:00.527180: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:14:01.451875: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 3562 ms
Training Time: 410 ms
Training Time: 394 ms
Training Time: 413 ms
Training Time: 406 ms
Training Time: 403 ms
Training Time: 394 ms
Training Time: 412 ms
Training Time: 394 ms
Training Time: 413 ms
Training Time: 392 ms
Training Time: 410 ms
Training Time: 406 ms
Training Time: 422 ms
Training Time: 406 ms
Training Time: 411 ms
Training Time: 374 ms
Training Time: 392 ms
Training Time: 385 ms
Training Time: 413 ms
Training Time: 414 ms
Training Time: 407 ms
14.2 - training | batch=10, size=1024x1536: 403 ± 11 ms
15/19. PSPNet
2023-09-16 01:14:26.256758: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:14:26.256842: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:14:26.256871: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:14:26.256923: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:14:26.256958: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:14:26.256973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:14:26.673789: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:14:27.041583: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 10691 ms
Inference Time: 230 ms
Inference Time: 229 ms
Inference Time: 234 ms
Inference Time: 229 ms
Inference Time: 229 ms
Inference Time: 229 ms
Inference Time: 231 ms
Inference Time: 227 ms
Inference Time: 235 ms
Inference Time: 236 ms
Inference Time: 229 ms
Inference Time: 229 ms
Inference Time: 233 ms
Inference Time: 234 ms
Inference Time: 235 ms
Inference Time: 241 ms
Inference Time: 228 ms
Inference Time: 229 ms
Inference Time: 231 ms
Inference Time: 231 ms
Inference Time: 230 ms
15.1 - inference | batch=5, size=720x720: 231 ± 3 ms
2023-09-16 01:14:45.300386: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:14:46.123581: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 5854 ms
Training Time: 128 ms
Training Time: 126 ms
Training Time: 125 ms
Training Time: 126 ms
Training Time: 125 ms
Training Time: 126 ms
Training Time: 125 ms
Training Time: 124 ms
Training Time: 126 ms
Training Time: 125 ms
Training Time: 126 ms
Training Time: 126 ms
Training Time: 126 ms
Training Time: 127 ms
Training Time: 126 ms
Training Time: 125 ms
Training Time: 125 ms
Training Time: 125 ms
Training Time: 125 ms
Training Time: 125 ms
Training Time: 126 ms
15.2 - training | batch=1, size=512x512: 125.6 ± 0.8 ms
16/19. DeepLab
2023-09-16 01:14:54.647838: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:14:54.647981: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:14:54.648043: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:14:54.648139: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:14:54.648215: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:14:54.648246: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:14:55.486111: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:14:56.103411: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 1426 ms
Inference Time: 50 ms
Inference Time: 49 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 48 ms
Inference Time: 49 ms
Inference Time: 49 ms
Inference Time: 50 ms
Inference Time: 50 ms
Inference Time: 49 ms
Inference Time: 50 ms
Inference Time: 49 ms
Inference Time: 50 ms
Inference Time: 49 ms
Inference Time: 49 ms
Inference Time: 49 ms
Inference Time: 49 ms
Inference Time: 49 ms
Inference Time: 49 ms
Inference Time: 49 ms
Inference Time: 49 ms
16.1 - inference | batch=2, size=512x512: 49.1 ± 0.6 ms
2023-09-16 01:15:01.135335: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:15:02.648263: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 2878 ms
Training Time: 78 ms
Training Time: 78 ms
Training Time: 78 ms
Training Time: 78 ms
Training Time: 78 ms
Training Time: 78 ms
Training Time: 79 ms
Training Time: 78 ms
Training Time: 79 ms
Training Time: 78 ms
Training Time: 79 ms
Training Time: 78 ms
Training Time: 79 ms
Training Time: 78 ms
Training Time: 78 ms
Training Time: 79 ms
Training Time: 79 ms
Training Time: 79 ms
Training Time: 79 ms
Training Time: 79 ms
Training Time: 82 ms
16.2 - training | batch=1, size=384x384: 78.6 ± 0.9 ms
17/19. Pixel-RNN
2023-09-16 01:15:06.435202: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:15:06.435363: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:15:06.435612: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:15:06.435715: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:15:06.435784: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:15:06.435814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:15:06.619004: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:15:06.619087: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:15:06.619114: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:15:06.619166: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:15:06.619195: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:15:06.619209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:15:19.205765: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:15:19.218428: W tensorflow/c/c_api.cc:305] Operation '{name:'conv2d_out_logits/biases/Adam_1/Assign' id:47369 op device:{requested: '', assigned: ''} def:{{{node conv2d_out_logits/biases/Adam_1/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](conv2d_out_logits/biases/Adam_1, conv2d_out_logits/biases/Adam_1/Initializer/zeros)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2023-09-16 01:15:20.533055: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:15:23.932621: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 4019 ms
Inference Time: 394 ms
Inference Time: 381 ms
Inference Time: 394 ms
Inference Time: 385 ms
Inference Time: 386 ms
Inference Time: 391 ms
Inference Time: 377 ms
Inference Time: 382 ms
Inference Time: 371 ms
Inference Time: 375 ms
Inference Time: 373 ms
Inference Time: 416 ms
Inference Time: 373 ms
Inference Time: 391 ms
Inference Time: 376 ms
Inference Time: 378 ms
Inference Time: 418 ms
Inference Time: 427 ms
Inference Time: 418 ms
Inference Time: 415 ms
Inference Time: 408 ms
17.1 - inference | batch=50, size=64x64: 392 ± 17 ms
2023-09-16 01:16:00.780439: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 17809 ms
Training Time: 2783 ms
Training Time: 2690 ms
Training Time: 2772 ms
Training Time: 2720 ms
17.2 - training | batch=10, size=64x64: 2741 ± 38 ms
18/19. LSTM-Sentiment
2023-09-16 01:16:16.892938: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:16:16.893023: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:16:16.893052: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:16:16.893111: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:16:16.893145: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:16:16.893160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:16:17.096621: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:16:17.105792: W tensorflow/c/c_api.cc:305] Operation '{name:'Variable_1/Adam_1/Assign' id:351 op device:{requested: '', assigned: ''} def:{{{node Variable_1/Adam_1/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](Variable_1/Adam_1, Variable_1/Adam_1/Initializer/zeros)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2023-09-16 01:16:17.144163: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:16:17.480118: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 743 ms
Inference Time: 448 ms
Inference Time: 452 ms
Inference Time: 436 ms
Inference Time: 434 ms
Inference Time: 445 ms
Inference Time: 445 ms
Inference Time: 442 ms
Inference Time: 448 ms
Inference Time: 448 ms
Inference Time: 449 ms
Inference Time: 448 ms
Inference Time: 446 ms
Inference Time: 449 ms
Inference Time: 447 ms
Inference Time: 442 ms
Inference Time: 440 ms
Inference Time: 452 ms
Inference Time: 455 ms
Inference Time: 439 ms
Inference Time: 448 ms
Inference Time: 452 ms
18.1 - inference | batch=100, size=1024x300: 446 ± 5 ms
2023-09-16 01:16:32.039514: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Training Time: 1554 ms
Training Time: 1441 ms
Training Time: 1414 ms
Training Time: 1400 ms
Training Time: 1390 ms
Training Time: 1392 ms
Training Time: 1374 ms
Training Time: 1378 ms
Training Time: 1416 ms
Training Time: 1389 ms
Training Time: 1384 ms
Training Time: 1415 ms
Training Time: 1381 ms
Training Time: 1410 ms
Training Time: 1365 ms
Training Time: 1387 ms
Training Time: 1394 ms
Training Time: 1381 ms
Training Time: 1359 ms
Training Time: 1368 ms
Training Time: 1431 ms
18.2 - training | batch=10, size=1024x300: 1393 ± 21 ms
19/19. GNMT-Translation
2023-09-16 01:17:01.992496: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:17:01.992623: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:17:01.992679: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:17:01.992762: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:17:01.992823: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:761] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-09-16 01:17:01.992849: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1911] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24020 MB memory: -> device: 0, name: Radeon RX 7900 XTX, pci bus id: 0000:2d:00.0
2023-09-16 01:17:02.196492: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:17:02.209058: W tensorflow/c/c_api.cc:305] Operation '{name:'index_to_string/table_init' id:13 op device:{requested: '', assigned: ''} def:{{{node index_to_string/table_init}} = InitializeTableFromTextFileV2[_has_manual_control_dependencies=true, delimiter="\t", key_index=-1, offset=0, value_index=-2, vocab_size=-1](index_to_string, index_to_string/table_init/asset_filepath)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2023-09-16 01:17:02.238052: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-09-16 01:17:02.429881: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
Inference Time: 988 ms
Inference Time: 122 ms
Inference Time: 121 ms
Inference Time: 121 ms
Inference Time: 122 ms
Inference Time: 124 ms
Inference Time: 122 ms
Inference Time: 121 ms
Inference Time: 122 ms
Inference Time: 122 ms
Inference Time: 124 ms
Inference Time: 128 ms
Inference Time: 124 ms
Inference Time: 121 ms
Inference Time: 122 ms
Inference Time: 122 ms
Inference Time: 121 ms
Inference Time: 122 ms
Inference Time: 124 ms
Inference Time: 125 ms
Inference Time: 121 ms
Inference Time: 123 ms
19.1 - inference | batch=1, size=1x20: 123 ± 2 ms
Device Inference Score: 23681
Device Training Score: 14143
Device AI Score: 37824
For more information and results, please visit http://ai-benchmark.com/alpha
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment