lut-reading-materials.md

FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs

relaxing requirement of one logic gate to one LUT allows area recovery
duplicates logic to produce more LUT mapping opportunities
implemented in Yosys flowmap, but without area-recovery heuristics, produces strictly worse results

careful cut selection can reduce memory and execution requirements because cut-enumeration mapping requires exploring all cuts.
the design can then be delay-relaxed to reduce area costs.

keeping only a few cuts, ranked by heuristics, and using cut enumeration to produce more.
this is what ABC's if mapper uses.

the original MIG paper
majority gates can act as an AND/OR gate, but are strictly more powerful
- mainly used as AND/OR in this paper though

optimised networks are very dependent on input network, and might not be globally optimal (called "structural bias")
functional reduction looks for equivalent mappings of the same function and uses "choice nodes" to encode these
- one mapping of a function might be useful in one area, but another mapping might be better in a different area
- reduces structural bias because the choice nodes might encode a more optimal mapping of the design
a precomputed array of known-optimal networks for N-input functions can be used to rewrite subgraphs to reduce trees
- handy source of choice nodes: these are known to be identical implementations
- does require effectively performing LUT mapping though

slow, global optimisation methods to compliment the fast, local algebraic optimisation
adding logic errors that don't overlap in functions to simplify them
results in reshaping of ripple-carry structures to carry-lookahead adders

careful application of the MIG rewrite rules to push up variables with long arrival times

not the first XMG paper, but it's the best I've seen
XOR doesn't map too well to majority gates (or AIG)
XORs come up often enough that they're worth handling specially (e.g. adders, parity functions)
XMG doesn't seem to have any papers on global optimisation like the MIG one

given an arbitrary cell library, you can find the optimum-delay solution for an N-input function
turns the infinite possibilities into a permutation (P) class, then looks up or computes the optimal delay function

remember MIG rewriting?
a naive approach for 5-input functions stores 2^2^5 functions in its database, one for each LUT value
you can reduce this by negating inputs, permuting inputs or negating outputs to another network
this is called NPN equivalency