FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs
- the classic paper
- compute LUT mappings through maximum-flow
- produces optimal depth designs
- but needs major area recovery, e.g. flow-pack
- implemented in the Yosys
flowmap
pass.
- relaxing requirement of one logic gate to one LUT allows area recovery
- duplicates logic to produce more LUT mapping opportunities
- implemented in Yosys
flowmap
, but without area-recovery heuristics, produces strictly worse results
- boolean synthesis at promising K+1 cones to turn them into K cones
- extension of FlowMap from the unit delay model to the edge delay model
- edge flow can be used to produce a more routable design
- fewer large LUTs, which is more packable into frangible LUTs
- careful cut selection can reduce memory and execution requirements because cut-enumeration mapping requires exploring all cuts.
- the design can then be delay-relaxed to reduce area costs.
- keeping only a few cuts, ranked by heuristics, and using cut enumeration to produce more.
- this is what ABC's
if
mapper uses.
- it's possible to map relatively efficiently if you work incrementally
- depth-optimality is not guaranteed
- needs area-recovery heuristics
- the original MIG paper
- majority gates can act as an AND/OR gate, but are strictly more powerful
- mainly used as AND/OR in this paper though
- optimised networks are very dependent on input network, and might not be globally optimal (called "structural bias")
- functional reduction looks for equivalent mappings of the same function and uses "choice nodes" to encode these
- one mapping of a function might be useful in one area, but another mapping might be better in a different area
- reduces structural bias because the choice nodes might encode a more optimal mapping of the design
- a precomputed array of known-optimal networks for N-input functions can be used to rewrite subgraphs to reduce trees
- handy source of choice nodes: these are known to be identical implementations
- does require effectively performing LUT mapping though
- slow, global optimisation methods to compliment the fast, local algebraic optimisation
- adding logic errors that don't overlap in functions to simplify them
- results in reshaping of ripple-carry structures to carry-lookahead adders
- careful application of the MIG rewrite rules to push up variables with long arrival times
- not the first XMG paper, but it's the best I've seen
- XOR doesn't map too well to majority gates (or AIG)
- XORs come up often enough that they're worth handling specially (e.g. adders, parity functions)
- XMG doesn't seem to have any papers on global optimisation like the MIG one
- given an arbitrary cell library, you can find the optimum-delay solution for an N-input function
- turns the infinite possibilities into a permutation (P) class, then looks up or computes the optimal delay function
- remember MIG rewriting?
- a naive approach for 5-input functions stores 2^2^5 functions in its database, one for each LUT value
- you can reduce this by negating inputs, permuting inputs or negating outputs to another network
- this is called NPN equivalency