Relation-Rule Based Callgraph Inference

2024/07/23 Leonard Ritter, Duangle GbR

A relational event graph is described with unpredicated rules relating events, connecting a product of n sources and m conditions to a single sink (also called the goal). The format is as follows:

Y :- X[1], X[2], ..., X[n], c[1], c[2], ..., c[m].

means "when all events X[1]..X[n] have happened, and all conditions c[1]..c[m] have been met, then the goal Y will happen". Because the right hand side is a product, the arguments are commutative and associative, so that the rule makes no demands as to in what order the sources are called, or the conditions are evaluated.

The resulting graph is a directed hypergraph of the B-graph class, with each rule constituting a B-arc.

The task is now to manifest the relational event graph as a callgraph (as precursor of a control flow graph) that satisfies all rules.

We recognize the callgraph as a transitive reduction of the hypergraph, which means all source edges of a rule are reducible to one (and only one) edge that connects to a callgraph path through all sources of the rule.

Because the callgraph is initially incomplete, we must start from trivially reducible rules with single sources. This adds more edges to the callgraph, which allows us to reduce more complex rules, until a fixpoint is reached because no more rules can be reduced. See this example (conditions omitted for clarity):

(1) A.
(2) B :- A.
(3) B :- A, B, C.
(4) C :- A, B.
(5) D :- A, B, C.
// iteration 1: trivially reducible rules
(1) A.
(2) B :- A.
// iteration 2
(4) C :- B. // :- A
// iteration 3
(3) B :- C. // :- B :- A
(5) D :- C. // :- B :- A

If a rule reduction leads to more than one possible callgraph edge, there exists an ambiguity in the program that the user must resolve by adding more edges.

If one or more rules remain undecidable after the fixpoint, the user should be warned as the rules would never be applied.

The resulting callgraph must still be converted to a CFG. The primary transformation required here is to bring all calls in topological order. Noteworthy are these caveats:

Branching rules may have non-exclusive conditions, implying that all outgoing edges of an event must be called in arbitrary order. Where we can prove outgoing edges to be exclusive, an if/switch optimization can be performed.
Likewise, rules are merged non-exclusive, meaning all incoming edges of an event E exist in an OR relationship and must be serialized in arbitrary order, with E as the final event. Where we can prove incoming edges to be exclusive, a phi-node optimization can be performed.
Because sources are not required to dominate the reduced rules, we can not guarantee at compile time that all source events of a rule will always have happened; we can only verify that a goal is connected to all sources. An additional transformation of the callgraph is required, in which non-dominating sources are structured to optional arguments on the dominant path, and then tested and destructured where the rule conditions are checked. Once destructured, subsequent events with the same dependence can reuse the work. This mechanism saves the user from having to implement sum type encodings to circumvent dominance issues.

Lowering Path

(Subject to ongoing research)

It appears i need to go this route:

Reduce data dependency hypergraph to get control graph:
- Collapse all determinable B-arcs by connecting directly to a suitable linear control graph path, which leads through all sources and doesn't have any cycles. This path always ends on the closest source, which is where we connect.
- TODO: Ambiguous cases exist where multiple B-arc edges lie on suitable paths. We need to find a heuristic to make a good choice.
- TODO: Undecidable cases exist where no B-arc edge lies on a suitable path. However, once the branching events have been sequencing their calls, such a path does exist. Therefore, we need to sequence branches ASAP.
Using the data dependency hypergraph and the control graph, apply lambda lifting to all sources on non-dominant paths, because they would violate lexical scope. Each dependency on a non-dominant path is optional. Hence such events are translated to optional arguments for subsequent events.
Lower both graphs to Control Flow Form, where each event constitutes a continuation closure. Branching events produce sequences of conditional returning calls (of which acyclical paths should be favored before cyclical ones). Merging events represent closure entry points.
Lower Control Flow Form to SSA?

Ideally it would be nice if we had a form that could do it all.

DRIL hypergraph is like a CFF with multiple conditional calls per node, in sequence, since conditions can be conjunctions. we prefer acyclical (returning) calls to be executed before the cyclical (interminable) ones. and among cyclical calls, we prefer cycles that reenter further back before cycles that reenter closely. arguably, each call could also be a thread invocation, which would at least be fair as to how they are scheduled (because they aren't).

DRIL data dependencies are mapped by closures sourcing values that aren't on their control path - which can be fixed by creating a control path. but we also have order-dependency constraints, where we just depend on a control path having been executed.

node(args...) -> cont:
    after <node>
    <op> <arg> | <other_node.arg> ...
    set
        [<boolean_condition>] node args... -> cont
        ...

Negated Events

(Subject to ongoing research)

We might also interpret the graph as functional flow, i.e. we start from the goals and work backwards to the facts. This isn't compatible with the event interpretation, since each function can then produce multiple results (as a generator), but it gives us a better understanding of what a negated event is. In the functional interpretation, a function either produces one or more results (indicating the function succeeded), or none, in which case it failed, implying the event failed. This requires stratification since the event conditions must be evaluated before we know if it succeeded.

In effect, an event can succeed (when any of its rules apply) or fail (when none of its rules apply), and we can define default flow.

Merge Events

(Subject to ongoing research)

The semantics of merging events (continuations with multiple in-edges, which have a non-exclusive OR relationship) need to be defined. Merging events must be sync points, analog to how branching events are (non-exclusive) forks of execution flow.

Let's model rules for an interpreter to gain more clarity on this question. The interpreter works in iterations. Each iteration, the state of an event changes when at least one of its input states has changed. We model changed events as bits that are zero at the beginning of an iteration, and are turned on as events are updated, based on the set bits of the previous iteration.

To begin with, it seems necessary to require that merge events are only activated once all linear events reach a fixed state. But merge event data is still ambiguous.

One prerequisite of our events is that each event is linear, i.e. has a single optional state: Some definitive value or None.

But multiple rules may apply different values to the same merge event, meaning we need to merge those values somehow.

Our 4 options are:

We arbitrarily choose a value.
The event doesn't happen when there is more than one possible value (XOR behavior).
An aggregate function f decides how the values are to be combined. The controller logic is described by the following Scopes pseudocode:
```
fold (x = empty) for y in source-events
    if (empty? x) y
    elseif (empty? y) x
    else (f x y)
```
The event must not take any arguments (void fold operator).

To evaluate:

(1) is surprising to the user. Even when deterministic, its behavior is difficult to predict and subject to subtle behavioral change when the program is altered.

(2) might be surprising to the user since it alters datalog semantics. The :- operator is expected to describe a non-exclusive implication. We would need to use a different operator such as -:-, standing for iff (if and only if). It requires the user to define a separate rule for every possible conjunctive case. Where an event should merge a set of 16 different events, 65535 rules would have to be defined to cover every case. The complexity can be reduced by chaining 15 events of which each defines three rules a ^ b ^ ab, but that would still require the definition of 45 rules!

(3) could be quite powerful, and forms an implicit implementation of (2), but adds complexity to the system. This is only permissible when there is no other way.

(4) is predictable and stable under program changes, but limits expressiveness. Due to relaxed dominance, subsequent rules may still depend on both the merge event and one of the events that it merged, but this way, branching values remain irreconcilable. There simply is no way to merge data from two different events into a single one. We can not even build flow that combines booleans from two events.

Let's explore what an implementation of (3) could look like. We could type the merge event as a prefabricated aggregate class (min, max, count, sum).

paniq/dril_rel_callgraph.md

Relation-Rule Based Callgraph Inference

Lowering Path

Negated Events

Merge Events