Tiler linearisation

The 'linearisation' step of tiling is more complex than might seem at first sight. This results from the work that is done between emitting the code for each tile, which sometimes uses the tree structure directly. That work consists of the following things:

Ensuring operand values are loaded into registers
- inserting loads
Ensuring a register is assigned for the result of an operation,
- Exceptions: 'fixed' registers, e.g. rsp, r13 (cu), r14 (frame), r15 (tc)
- Other exceptions: instructions that only work with specific registers (e.g. arglist values)
Ensuring registers are available for computation by spilling
- insertingn stores
Ensuring values that will be invalidated by an instruction (e.g. call) are stored to memory
- inserting stores
Inserting labels for conditionals
Inserting conditional jumps after if, any, all, etc.
- mostly a side-effect from the postorder implementation of compilation; we could add the 'if tile' as postorder in some cases, inorder in other cases

The reason for this step is simply that a definitive order of operations is required for dividing instructions into basic blocks, and that basic blocks are a required concept for correct management of values and registers. So these concepts should best be considered together.

Demands on the linear format

The linear format should be able to express both the tiles whose code is to be emitted and the 'intermediate' segments that glue them together. Notably, these pieces have become intermediate segments primarily because they couldn't be expressed as postorder calculations in the first place. If I could design a tile-specific way to automatically order the tiles, that could potentially be an elegant solution; but at present I doubt whether this is practical. (Consider also variadic nodes like ALL and ANY, and nodes that specify the order of their operands in memory like ARGLIST.

At the very least, the linear format should contain simple tiles. A 'simple' tile consists of a code-emitting function and a path-description that tells the compiler how to obtain the values it uses in the tree. This is convenient when developing code generation 'emit' functions because these values are passed to the function; but it is also essential for proper value management. Otherwise it would be impossible to know when a value is last used and thus can be freed. Note that such a tile instance is intrinsically coupled to its location in the tree, which is currently implemented by tree-tagging. In the linear format, this coupling must be preserved.

While all tiles correspond to machine code, not all machine code is generated (currently) by simple tiles. For instance, loads, stores, jumps and labels tilingare all generated by special routine calls during tree traversal. At the risk of repeating myself, this is for two reasons:

Register and memory management are treated independently from and posterior to selection of instructions (as it should be), so the tiler has no idea of memory traffic.
Labels and jumps have to be added before, after and inbetween operations, and can't be added by the tiler as 'regular' tiles, because the tiler only works with postorder tiles.

Nevertheless, it would be highly desirable to treat such machine code generation routines as if they were in fact simple tiles, because this greatly simplifies the code generator. All complexity can then be moved out of the code generator into separate stages, which can be tested and validated in isolation. Thus, we'd like to have pseudotiles, which represent machine code that doesn't fit into tiles.

Like the regular tiles, pseudotiles take arguments; unlike regular tiles, these arguments are not bound to the tree. (The tree does not store which registers are in use, for instance). (That is actually not currently true; the tree 'info' nodes store 'value' structures, which do contain the current (mutable!) register and memory locations. But since these may change (e.g. as a result of a value move), this won't do for the pseudotile that has to implement the move.....

In fact, it won't do to point tiles directly to these tree-tagged values anyway; unless we decouple the value-descriptors from the tree in general. That would be doable with a struct like:

struct MVMJitValueDescriptor {
    MVMint32 node;
    enum { IN_MEMORY, GPR, FPR } type;
    MVMint32 location;
    MVMint8   size; /* nr of bytes in this value */
};

It may also be necessary to keep a linked list of such descriptors in a hash table for the register allocator, implementation too simple to sketch out. And similarily, it may be necessary to store value lifetime into the value descriptor, but it might be better to store this externally, too.

Anyway, it is such value descriptors that are of primary interest to the tiles, these value descriptors can be completely decoupled from the tree itself.

Tile emit functions currently take the following function signature:

void (*emit)(MVMThreadContext *tc, MVMJitCompiler *compiler,
             MVMJitExprTree *tree, MVMint32 node,
             MVMJitExprValue **values, MVMJitExprNode *args);

The MVMJitExprValue array-of-pointers can become an array-of-MVMJitValueDescriptor values as described above. The MVMMJitExprNode array can remain as it is, and can be used to pass arbitrary pointers and constants. Given that we would tightly couple tiles and value descriptors, it would seem suitable to store them togeter:

 struct MVMJitTile {
      void (*emit)(....);
      MVMint32 node;
      MVMjitValueDescriptor args[8];  /* how many arguments are enough?*/
      MVMJitExprNode        params[8]; /* and how many params?
 };

I'd like to think such tiles can be used to represent all possible machine-code segments we could generate, but I may be wrong.

Tiles interact with register assigment and allocation

On a non-ideal architecture like amd64, many operations require the use of specific registers. E.g. div requires the use of both rax and rcx registers. call results will always be stored within the rax registers, and ARGLIST must place its argument values in a specific order of registers. (Although ARGLIST is a platform-specific special case, which we can probably not handle by generic logic).

I'd like for there to be some facility to expose the registers used by a tile to the register allocator. I propose a flags value for each tile, which would be generated from compile-time constants, and which would be optional:

(tile: idiv (div reg reg) reg 4 (use: x64_rax x64_rcx yield: x64_rax))

Which would translate into the following expression (for the flags field):

MVM_JIT_FLAG_USE(MVM_JIT_X64_RAX)|MVM_JIT_FLAG_USE(MVM_JIT_X64_RCX)|MVM_JIT_FLAG_YIELD(MVM_JIT_X64_RAX)

Which is information which we could then read to inform the register allocator and assignment logic.

bdw/tilin.md

Tiler linearisation

Demands on the linear format

Tiles interact with register assigment and allocation