Last active
November 28, 2018 17:22
-
-
Save dsprenkels/6ed8d1c11fd1dc0e17f6baf3f50038c9 to your computer and use it in GitHub Desktop.
Pipeline analysis of radix-2^25.5 interleaved carry ripple modulo 2^255-19
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Iterations: 100 | |
Instructions: 4200 | |
Total Cycles: 1406 | |
Dispatch Width: 4 | |
IPC: 2.99 | |
Block RThroughput: 13.0 | |
Instruction Info: | |
[1]: #uOps | |
[2]: Latency | |
[3]: RThroughput | |
[4]: MayLoad | |
[5]: MayStore | |
[6]: HasSideEffects (U) | |
[1] [2] [3] [4] [5] [6] Instructions: | |
1 1 1.00 vpsrlq $26, %ymm0, %ymm15 | |
1 1 0.50 vpaddq %ymm15, %ymm1, %ymm1 | |
1 7 0.50 * vmovdqa (%rip), %ymm13 | |
1 1 0.33 vpand %ymm13, %ymm0, %ymm0 | |
1 1 1.00 vpsrlq $25, %ymm5, %ymm15 | |
1 1 0.50 vpaddq %ymm15, %ymm6, %ymm6 | |
1 7 0.50 * vmovdqa (%rip), %ymm12 | |
1 1 0.33 vpand %ymm12, %ymm5, %ymm5 | |
1 1 1.00 vpsrlq $25, %ymm1, %ymm15 | |
1 1 0.50 vpaddq %ymm15, %ymm2, %ymm2 | |
1 1 0.33 vpand %ymm12, %ymm1, %ymm1 | |
1 1 1.00 vpsrlq $26, %ymm6, %ymm15 | |
1 1 0.50 vpaddq %ymm15, %ymm7, %ymm7 | |
1 1 0.33 vpand %ymm13, %ymm6, %ymm6 | |
1 1 1.00 vpsrlq $26, %ymm2, %ymm15 | |
1 1 0.50 vpaddq %ymm15, %ymm3, %ymm3 | |
1 1 0.33 vpand %ymm13, %ymm2, %ymm2 | |
1 1 1.00 vpsrlq $25, %ymm7, %ymm15 | |
1 1 0.50 vpaddq %ymm15, %ymm8, %ymm8 | |
1 1 0.33 vpand %ymm12, %ymm7, %ymm7 | |
1 1 1.00 vpsrlq $25, %ymm3, %ymm15 | |
1 1 0.50 vpaddq %ymm15, %ymm4, %ymm4 | |
1 1 0.33 vpand %ymm12, %ymm3, %ymm3 | |
1 1 1.00 vpsrlq $26, %ymm8, %ymm15 | |
1 1 0.50 vpaddq %ymm15, %ymm9, %ymm9 | |
1 1 0.33 vpand %ymm13, %ymm8, %ymm8 | |
1 1 1.00 vpsrlq $26, %ymm4, %ymm15 | |
1 1 0.50 vpaddq %ymm15, %ymm5, %ymm5 | |
1 1 0.33 vpand %ymm13, %ymm4, %ymm4 | |
1 1 1.00 vpsrlq $25, %ymm9, %ymm15 | |
1 1 1.00 vpsllq $4, %ymm15, %ymm14 | |
1 1 0.50 vpaddq %ymm14, %ymm0, %ymm0 | |
1 1 0.50 vpaddq %ymm15, %ymm15, %ymm14 | |
1 1 0.50 vpaddq %ymm15, %ymm14, %ymm15 | |
1 1 0.50 vpaddq %ymm15, %ymm0, %ymm0 | |
1 1 0.33 vpand %ymm12, %ymm9, %ymm9 | |
1 1 1.00 vpsrlq $25, %ymm5, %ymm15 | |
1 1 0.50 vpaddq %ymm15, %ymm6, %ymm6 | |
1 1 0.33 vpand %ymm12, %ymm5, %ymm5 | |
1 1 1.00 vpsrlq $26, %ymm0, %ymm15 | |
1 1 0.50 vpaddq %ymm15, %ymm1, %ymm1 | |
1 1 0.33 vpand %ymm13, %ymm0, %ymm0 | |
Dynamic Dispatch Stall Cycles: | |
RAT - Register unavailable: 0 | |
RCU - Retire tokens unavailable: 0 | |
SCHEDQ - Scheduler full: 1145 | |
LQ - Load queue full: 0 | |
SQ - Store queue full: 0 | |
GROUP - Static restrictions on the dispatch group: 0 | |
Dispatch Logic - number of cycles where we saw N instructions dispatched: | |
[# dispatched], [# cycles] | |
0, 22 (1.6%) | |
2, 190 (13.5%) | |
3, 956 (68.0%) | |
4, 238 (16.9%) | |
Schedulers - number of cycles where we saw N instructions issued: | |
[# issued], [# cycles] | |
0, 3 (0.2%) | |
1, 1 (0.1%) | |
2, 204 (14.5%) | |
3, 1001 (71.2%) | |
4, 197 (14.0%) | |
Scheduler's queue usage: | |
SBPortAny, 54/54 | |
Retire Control Unit - number of cycles where we saw N instructions retired: | |
[# retired], [# cycles] | |
0, 7 (0.5%) | |
1, 303 (21.6%) | |
2, 2 (0.1%) | |
3, 887 (63.1%) | |
4, 6 (0.4%) | |
5, 103 (7.3%) | |
7, 97 (6.9%) | |
14, 1 (0.1%) | |
Register File statistics: | |
Total number of mappings created: 4200 | |
Max number of mappings used: 67 | |
Resources: | |
[0] - SBDivider | |
[1] - SBFPDivider | |
[2] - SBPort0 | |
[3] - SBPort1 | |
[4] - SBPort4 | |
[5] - SBPort5 | |
[6.0] - SBPort23 | |
[6.1] - SBPort23 | |
Resource pressure per iteration: | |
[0] [1] [2] [3] [4] [5] [6.0] [6.1] | |
- - 14.01 12.99 - 13.00 - 2.00 | |
Resource pressure by instruction: | |
[0] [1] [2] [3] [4] [5] [6.0] [6.1] Instructions: | |
- - 1.00 - - - - - vpsrlq $26, %ymm0, %ymm15 | |
- - - 0.96 - 0.04 - - vpaddq %ymm15, %ymm1, %ymm1 | |
- - - - - - - 1.00 vmovdqa (%rip), %ymm13 | |
- - - 0.03 - 0.97 - - vpand %ymm13, %ymm0, %ymm0 | |
- - 1.00 - - - - - vpsrlq $25, %ymm5, %ymm15 | |
- - - 0.03 - 0.97 - - vpaddq %ymm15, %ymm6, %ymm6 | |
- - - - - - - 1.00 vmovdqa (%rip), %ymm12 | |
- - 0.96 0.03 - 0.01 - - vpand %ymm12, %ymm5, %ymm5 | |
- - 1.00 - - - - - vpsrlq $25, %ymm1, %ymm15 | |
- - - 0.01 - 0.99 - - vpaddq %ymm15, %ymm2, %ymm2 | |
- - 0.01 0.97 - 0.02 - - vpand %ymm12, %ymm1, %ymm1 | |
- - 1.00 - - - - - vpsrlq $26, %ymm6, %ymm15 | |
- - - 1.00 - - - - vpaddq %ymm15, %ymm7, %ymm7 | |
- - - 0.02 - 0.98 - - vpand %ymm13, %ymm6, %ymm6 | |
- - 1.00 - - - - - vpsrlq $26, %ymm2, %ymm15 | |
- - - 0.97 - 0.03 - - vpaddq %ymm15, %ymm3, %ymm3 | |
- - 0.01 0.01 - 0.98 - - vpand %ymm13, %ymm2, %ymm2 | |
- - 1.00 - - - - - vpsrlq $25, %ymm7, %ymm15 | |
- - - 0.99 - 0.01 - - vpaddq %ymm15, %ymm8, %ymm8 | |
- - 0.01 0.02 - 0.97 - - vpand %ymm12, %ymm7, %ymm7 | |
- - 1.00 - - - - - vpsrlq $25, %ymm3, %ymm15 | |
- - - 0.97 - 0.03 - - vpaddq %ymm15, %ymm4, %ymm4 | |
- - - 0.02 - 0.98 - - vpand %ymm12, %ymm3, %ymm3 | |
- - 1.00 - - - - - vpsrlq $26, %ymm8, %ymm15 | |
- - - 0.98 - 0.02 - - vpaddq %ymm15, %ymm9, %ymm9 | |
- - - 0.03 - 0.97 - - vpand %ymm13, %ymm8, %ymm8 | |
- - 1.00 - - - - - vpsrlq $26, %ymm4, %ymm15 | |
- - - 0.97 - 0.03 - - vpaddq %ymm15, %ymm5, %ymm5 | |
- - - 0.02 - 0.98 - - vpand %ymm13, %ymm4, %ymm4 | |
- - 1.00 - - - - - vpsrlq $25, %ymm9, %ymm15 | |
- - 1.00 - - - - - vpsllq $4, %ymm15, %ymm14 | |
- - - 0.97 - 0.03 - - vpaddq %ymm14, %ymm0, %ymm0 | |
- - - 0.98 - 0.02 - - vpaddq %ymm15, %ymm15, %ymm14 | |
- - - 0.03 - 0.97 - - vpaddq %ymm15, %ymm14, %ymm15 | |
- - - 0.97 - 0.03 - - vpaddq %ymm15, %ymm0, %ymm0 | |
- - - 0.03 - 0.97 - - vpand %ymm12, %ymm9, %ymm9 | |
- - 1.00 - - - - - vpsrlq $25, %ymm5, %ymm15 | |
- - - 0.03 - 0.97 - - vpaddq %ymm15, %ymm6, %ymm6 | |
- - 0.02 0.01 - 0.97 - - vpand %ymm12, %ymm5, %ymm5 | |
- - 1.00 - - - - - vpsrlq $26, %ymm0, %ymm15 | |
- - - 0.97 - 0.03 - - vpaddq %ymm15, %ymm1, %ymm1 | |
- - - 0.97 - 0.03 - - vpand %ymm13, %ymm0, %ymm0 | |
Timeline view: | |
0123456789 | |
Index 0123456789 | |
[0,0] DeER . . . . vpsrlq $26, %ymm0, %ymm15 | |
[0,1] D=eER. . . . vpaddq %ymm15, %ymm1, %ymm1 | |
[0,2] DeeeeeeeER. . . vmovdqa (%rip), %ymm13 | |
[0,3] D=======eER . . vpand %ymm13, %ymm0, %ymm0 | |
[0,4] .DeE------R . . vpsrlq $25, %ymm5, %ymm15 | |
[0,5] .D=eE-----R . . vpaddq %ymm15, %ymm6, %ymm6 | |
[0,6] .DeeeeeeeER . . vmovdqa (%rip), %ymm12 | |
[0,7] .D=======eER . . vpand %ymm12, %ymm5, %ymm5 | |
[0,8] . DeE------R . . vpsrlq $25, %ymm1, %ymm15 | |
[0,9] . D=eE-----R . . vpaddq %ymm15, %ymm2, %ymm2 | |
[0,10] . D======eER . . vpand %ymm12, %ymm1, %ymm1 | |
[0,11] . D=eE-----R . . vpsrlq $26, %ymm6, %ymm15 | |
[0,12] . D=eE----R . . vpaddq %ymm15, %ymm7, %ymm7 | |
[0,13] . D====eE-R . . vpand %ymm13, %ymm6, %ymm6 | |
[0,14] . D=eE----R . . vpsrlq $26, %ymm2, %ymm15 | |
[0,15] . D==eE---R . . vpaddq %ymm15, %ymm3, %ymm3 | |
[0,16] . D===eE-R . . vpand %ymm13, %ymm2, %ymm2 | |
[0,17] . D=eE---R . . vpsrlq $25, %ymm7, %ymm15 | |
[0,18] . D==eE--R . . vpaddq %ymm15, %ymm8, %ymm8 | |
[0,19] . D====eER . . vpand %ymm12, %ymm7, %ymm7 | |
[0,20] . D=eE--R . . vpsrlq $25, %ymm3, %ymm15 | |
[0,21] . D====eER . . vpaddq %ymm15, %ymm4, %ymm4 | |
[0,22] . D====eER . . vpand %ymm12, %ymm3, %ymm3 | |
[0,23] . D====eER . . vpsrlq $26, %ymm8, %ymm15 | |
[0,24] . .D====eER . . vpaddq %ymm15, %ymm9, %ymm9 | |
[0,25] . .D====eER . . vpand %ymm13, %ymm8, %ymm8 | |
[0,26] . .D====eER . . vpsrlq $26, %ymm4, %ymm15 | |
[0,27] . .D=====eER. . vpaddq %ymm15, %ymm5, %ymm5 | |
[0,28] . . D====eER. . vpand %ymm13, %ymm4, %ymm4 | |
[0,29] . . D====eER. . vpsrlq $25, %ymm9, %ymm15 | |
[0,30] . . D=====eER . vpsllq $4, %ymm15, %ymm14 | |
[0,31] . . D======eER . vpaddq %ymm14, %ymm0, %ymm0 | |
[0,32] . . D====eE-R . vpaddq %ymm15, %ymm15, %ymm14 | |
[0,33] . . D=====eER . vpaddq %ymm15, %ymm14, %ymm15 | |
[0,34] . . D======eER . vpaddq %ymm15, %ymm0, %ymm0 | |
[0,35] . . D====eE--R . vpand %ymm12, %ymm9, %ymm9 | |
[0,36] . . D====eE-R . vpsrlq $25, %ymm5, %ymm15 | |
[0,37] . . D=====eER . vpaddq %ymm15, %ymm6, %ymm6 | |
[0,38] . . D=====eER . vpand %ymm12, %ymm5, %ymm5 | |
[0,39] . . D======eER. vpsrlq $26, %ymm0, %ymm15 | |
[0,40] . . D======eER vpaddq %ymm15, %ymm1, %ymm1 | |
[0,41] . . D=====eE-R vpand %ymm13, %ymm0, %ymm0 | |
Average Wait times (based on the timeline view): | |
[0]: Executions | |
[1]: Average time spent waiting in a scheduler's queue | |
[2]: Average time spent waiting in a scheduler's queue while ready | |
[3]: Average time elapsed from WB until retire stage | |
[0] [1] [2] [3] | |
0. 1 1.0 1.0 0.0 vpsrlq $26, %ymm0, %ymm15 | |
1. 1 2.0 0.0 0.0 vpaddq %ymm15, %ymm1, %ymm1 | |
2. 1 1.0 1.0 0.0 vmovdqa (%rip), %ymm13 | |
3. 1 8.0 0.0 0.0 vpand %ymm13, %ymm0, %ymm0 | |
4. 1 1.0 1.0 6.0 vpsrlq $25, %ymm5, %ymm15 | |
5. 1 2.0 0.0 5.0 vpaddq %ymm15, %ymm6, %ymm6 | |
6. 1 1.0 1.0 0.0 vmovdqa (%rip), %ymm12 | |
7. 1 8.0 0.0 0.0 vpand %ymm12, %ymm5, %ymm5 | |
8. 1 1.0 0.0 6.0 vpsrlq $25, %ymm1, %ymm15 | |
9. 1 2.0 0.0 5.0 vpaddq %ymm15, %ymm2, %ymm2 | |
10. 1 7.0 0.0 0.0 vpand %ymm12, %ymm1, %ymm1 | |
11. 1 2.0 0.0 5.0 vpsrlq $26, %ymm6, %ymm15 | |
12. 1 2.0 0.0 4.0 vpaddq %ymm15, %ymm7, %ymm7 | |
13. 1 5.0 0.0 1.0 vpand %ymm13, %ymm6, %ymm6 | |
14. 1 2.0 0.0 4.0 vpsrlq $26, %ymm2, %ymm15 | |
15. 1 3.0 0.0 3.0 vpaddq %ymm15, %ymm3, %ymm3 | |
16. 1 4.0 0.0 1.0 vpand %ymm13, %ymm2, %ymm2 | |
17. 1 2.0 0.0 3.0 vpsrlq $25, %ymm7, %ymm15 | |
18. 1 3.0 0.0 2.0 vpaddq %ymm15, %ymm8, %ymm8 | |
19. 1 5.0 0.0 0.0 vpand %ymm12, %ymm7, %ymm7 | |
20. 1 2.0 0.0 2.0 vpsrlq $25, %ymm3, %ymm15 | |
21. 1 5.0 2.0 0.0 vpaddq %ymm15, %ymm4, %ymm4 | |
22. 1 5.0 1.0 0.0 vpand %ymm12, %ymm3, %ymm3 | |
23. 1 5.0 2.0 0.0 vpsrlq $26, %ymm8, %ymm15 | |
24. 1 5.0 0.0 0.0 vpaddq %ymm15, %ymm9, %ymm9 | |
25. 1 5.0 3.0 0.0 vpand %ymm13, %ymm8, %ymm8 | |
26. 1 5.0 0.0 0.0 vpsrlq $26, %ymm4, %ymm15 | |
27. 1 6.0 0.0 0.0 vpaddq %ymm15, %ymm5, %ymm5 | |
28. 1 5.0 1.0 0.0 vpand %ymm13, %ymm4, %ymm4 | |
29. 1 5.0 0.0 0.0 vpsrlq $25, %ymm9, %ymm15 | |
30. 1 6.0 0.0 0.0 vpsllq $4, %ymm15, %ymm14 | |
31. 1 7.0 0.0 0.0 vpaddq %ymm14, %ymm0, %ymm0 | |
32. 1 5.0 0.0 1.0 vpaddq %ymm15, %ymm15, %ymm14 | |
33. 1 6.0 0.0 0.0 vpaddq %ymm15, %ymm14, %ymm15 | |
34. 1 7.0 0.0 0.0 vpaddq %ymm15, %ymm0, %ymm0 | |
35. 1 5.0 1.0 2.0 vpand %ymm12, %ymm9, %ymm9 | |
36. 1 5.0 1.0 1.0 vpsrlq $25, %ymm5, %ymm15 | |
37. 1 6.0 0.0 0.0 vpaddq %ymm15, %ymm6, %ymm6 | |
38. 1 6.0 2.0 0.0 vpand %ymm12, %ymm5, %ymm5 | |
39. 1 7.0 0.0 0.0 vpsrlq $26, %ymm0, %ymm15 | |
40. 1 7.0 0.0 0.0 vpaddq %ymm15, %ymm1, %ymm1 | |
41. 1 6.0 0.0 1.0 vpand %ymm13, %ymm0, %ymm0 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment