In a recent paper, a method is described for calculating the loop-carried dependencies (LCD), if any, of a assembly level loop, as follows:
OSACA can detect LCDs by creating a DAG of a code comprising two back-to-back copies of the loop body. It can thus analyze all possible paths from each vertex of the first kernel section and detect cyclic LCDs if there exists a dependency chain from one instruction form to its corresponding duplicate.
However, I don't think this is sufficient to catch all loop carried dependencies. In particular, you can have a case where a dependency is loop carried, but where no vertex (instruction) in the second iteration depends on the same vertex1 in the first. Rather, some other vertex in the second depends on the first, and in some subsequent iteration, the cycle is completed.
An example:
add eax, 1 ; A (deps: E-previous)
add eax, 1 ; B (deps: A)
add eax, 1 ; C (deps: B)
; swap eax, ebx (exploded version of xchg eax, ebx)
mov ecx, eax ; D (deps: C)
mov eax, ebx ; E (deps: F-previous)
mov ebx, ecx ; F (deps: D)
Is there a loop carried dependecy here? Yes, there is: the chain of add
instructions in all even iterations depends on the result of the chain in the preceeding even iteration, and the same is true for odd iterations. In particular, there is an intra-iteration chain A->B->C->D->F
, but this chain only depends on E
from the previous iteration which doens't appear in the chain. The other chain is simply E
, which depends on F
from the previous iteration. So the dependency is carried, but it takes two iterations for any dependency to flow back to the same vertex.
We see then that there are two equivalent carried dependecy chains of length 3 (if we assume the mov
are eliminated with zero latency), but each takes 2 iterations, so the total LCD length is 1.5. This also shows you can have non-integer LCD lengths when measured per iteration!
I would show the OSACA generated graphs here, but I couldn't get it to run, so you'll have to use your imagination. If anyone wants to contribute ASCII art, I'll include it :).
1 By same I mean "its corresponding dulicate" as in the quoted portion of the paper.
Thanks for the comments, Jan!
I created this issue to discuss the reg dependency case which I guess is a better place than a gist to discuss it, as I could not e.g., notify you by username here.