Ravenslofty · April 24, 2020 15:03
diff --git a/alm_sim.v b/alm_sim.v
 // The core logic primitive of the Cyclone V/10GX is the Adaptive Logic Module
 // (ALM). Each ALM is made up of an 8-input, 2-output look-up table, covered
 // in this file, connected to combinational outputs, a carry chain, and four
 // D flip-flops (which are covered as MISTRAL_FF in mem_sim.v).
 //
 // The ALM is vertically symmetric, so I find it helps to think in terms of
 // half-ALMs, as that's predominantly the unit that synth_intel_alm uses.
 //
 // ALMs are quite flexible, having multiple modes.
 //
 // Normal (combinational) mode
 // ---------------------------
 // The ALM can implement:
 // - a single 6-input function (with the other inputs usable for flip-flop access)
 // - two 5-input functions that share two inputs
 // - a 5-input and a 4-input function that share one input
 // - a 5-input and a 3-or-less-input function that share no inputs
 // - two 4-or-less-input functions that share no inputs
 //
 // Normal-mode functions are represented as MISTRAL_ALUTN cells with N inputs.
 // It would be possible to represent a normal mode function as a single cell -
 // the vendor cyclone{v,10gx}_lcell_comb cell does exactly that - but I felt
 // it was more user-friendly to print out the specific function sizes
 // separately.
 //
 // With the exception of MISTRAL_ALUT6, you can think of two normal-mode cells
 // fitting inside a single ALM.
 //
 // Extended (7-input) mode
 // -----------------------
 // The ALM can also fit a 7-input function made of two 5-input functions that
 // share four inputs, multiplexed by another input.
 //
 // Because this can't accept arbitrary 7-input functions, Yosys can't handle
 // it, so it doesn't have a cell, but I would likely call it MISTRAL_ALUT7(E?)
 // if it did, and it would take up a full ALM.
 //
 // It might be possible to add an extraction pass to examine all ALUT5 cells
 // that feed into ALUT3 cells to see if they can be combined into an extended
 // ALM, but I don't think it will be worth it.
 //
 // Arithmetic mode
 // ---------------
 // In arithmetic mode, each half-ALM uses its carry chain to perform fast addition
 // of two four-input functions that share three inputs. Oddly, the result of
 // one of the functions is inverted before being added (you can see this as
 // the dot on a full-adder input of Figure 1-8 in the Handbook).
 //
 // The cell for an arithmetic-mode half-ALM is MISTRAL_ALM_ARITH. One idea
 // I've had (or rather was suggested by mwk) is that functions that feed into
 // arithmetic-mode cells could be packed directly into the arithmetic-mode
 // cell as a function, which reduces the number of ALMs needed.
 //
 // Shared arithmetic mode
 // ----------------------
 // Shared arithmetic mode looks a lot like arithmetic mode, but here the
 // output of every other four-input function goes to the input of the adder
 // the next bit along. What this means is that adding three bits together can
 // be done in an ALM, because functions can be used to implement addition that
 // then feeds into the carry chain. This means that three bits can be added per
 // ALM, as opposed to two in the arithmetic mode.
 //
 // Shared arithmetic mode doesn't currently have a cell, but I intend to add
 // it as MISTRAL_ALM_SHARED, and have it occupy a full ALM. Because it adds
 // three bits per cell, it makes addition shorter and use less ALMs, but
 // I don't know enough to tell whether it's more efficient to use shared
 // arithmetic mode to shorten the carry chain, or plain arithmetic mode with
 // the functions packed in.
diff --git a/dff_sim.v b/dff_sim.v
 // The four D flip-flops (DFFs) in a Cyclone V/10GX Adaptive Logic Module (ALM)
 // act as one-bit memory cells that can be placed very flexibly (wherever there's
 // an ALM); each flop is represented by a MISTRAL_FF cell.
 //
 // The flops in these chips are rather flexible in some ways, but in practice
 // quite crippled by FPGA standards.
 //
 // What the flops can do
 // ---------------------
 // The core flop acts as a single-bit memory that initialises to zero at chip
 // reset. It takes in data on the rising edge of CLK if ENA is high,
 // and outputs it to Q. The ENA (clock enable) pin can therefore be used to
 // capture the input only if a condition is true.
 //
 // The data itself is zero if SCLR (synchronous clear) is high, else it comes
 // from SDATA (synchronous data) if SLOAD (synchronous load) is high, or DATAIN
 // if SLOAD is low.
 //
 // If ACLR (asynchronous clear) is low then Q is forced to zero, regardless of
 // the synchronous inputs or CLK edge. This is most often used for an FPGA-wide
 // power-on reset.
 //
 // An asynchronous set that sets Q to one can be emulated by inverting the input
 // and output of the flop, resulting in ACLR forcing Q to zero, which then gets
 // inverted to produce one. Likewise, logic can operate on the falling edge of
 // CLK if CLK is inverted before being passed as an input.
 //
 // What the flops *can't* do
 // -------------------------
 // The trickiest part of the above capabilities is the lack of configurable
 // initialisation state. For example, it isn't possible to implement a flop with
 // asynchronous clear that initialises to one, because the hardware initialises
 // to zero. Likewise, you can't emulate a flop with asynchronous set that
 // initialises to zero, because the inverters mean the flop initialises to one.
 //
 // If the input design requires one of these cells (which appears to be rare
 // in practice) then synth_intel_alm will fail to synthesize the design where
 // other Yosys synthesis scripts might succeed.
 //
 // This stands in notable contrast to e.g. Xilinx flip-flops, which have
 // configurable initialisation state and native synchronous/asynchronous
 // set/clear (although not at the same time), which means they can generally
 // implement a much wider variety of logic.
diff --git a/mem_sim.v b/mem_sim.v
 // The MLAB
 // --------
 // In addition to Logic Array Blocks (LABs) that contain ten Adaptive Logic
 // Modules (ALMs, see alm_sim.v), the Cyclone V/10GX also contain
 // Memory/Logic Array Blocks (MLABs) that can act as either ten ALMs, or utilise
 // the memory the ALM uses to store the look-up table data for general usage, 
 // producing a 32 address by 20-bit block of memory. MLABs are spread out
 // around the chip, so they can be placed near where they are needed, rather than
 // being comparatively limited in placement for a deep but narrow memory such as
 // the M10K memory block.
 //
 // MLABs are used mainly for shallow but wide memories, such as CPU register
 // files (which have perhaps 32 registers that are comparatively wide (16/32-bit))
 // or shift registers (by using the output of the Nth bit as input for the N+1th
 // bit).
 //
 // Oddly, instead of providing a block 32 address by 20-bit cell, Quartus asks
 // synthesis tools to build MLABs out of 32 address by 1-bit cells, and tries
 // to put these cells in the same MLAB during cell placement. Because of this
 // a MISTRAL_MLAB cell represents one of these 32 address by 1-bit cells, and
 // 20 of them represent a physical MLAB.
 //
 // How the MLAB works
 // ------------------
 // MLABs are poorly documented, so the following information is based mainly
 // on the simulation model and my knowledge of how memories like these work.
 // Additionally, note that the ports of MISTRAL_MLAB are the ones auto-generated
 // by the Yosys `memory_bram` pass, and it doesn't make sense to me to use
 // `techmap` just for the sake of renaming the cell ports.
 //
 // The MLAB can be initialised to any value, but unfortunately Quartus only
 // allows memory initialisation from a file. Since Yosys doesn't preserve input
 // file information, or write the contents of an `initial` block to a file,
 // Yosys can't currently initialise the MLAB in a way Quartus will accept.
 //
 // The MLAB takes in data from A1DATA at the rising edge of CLK1, and if A1EN
 // is high, writes it to the address in A1ADDR. A1EN can therefore be used to
 // conditionally write data to the MLAB.
 //
 // Simultaneously, the MLAB reads data from B1ADDR, and outputs it to B1DATA,
 // asynchronous to CLK1 and ignoring A1EN. If a synchronous read is needed
 // then the output can be fed to embedded flops. Presently, Yosys assumes
 // Quartus will pack external flops into the MLAB, but this is an assumption
 // that needs testing.
 //
 // If data is simultaneously read from and written to an address, the read
 // data is undefined, so write-to-read data pass-through using multiplexers
 // is implemented by Yosys if needed.
	// The core logic primitive of the Cyclone V/10GX is the Adaptive Logic Module
	// (ALM). Each ALM is made up of an 8-input, 2-output look-up table, covered
	// in this file, connected to combinational outputs, a carry chain, and four
	// D flip-flops (which are covered as MISTRAL_FF in mem_sim.v).
	//
	// The ALM is vertically symmetric, so I find it helps to think in terms of
	// half-ALMs, as that's predominantly the unit that synth_intel_alm uses.
	//
	// ALMs are quite flexible, having multiple modes.
	//
	// Normal (combinational) mode
	// ---------------------------
	// The ALM can implement:
	// - a single 6-input function (with the other inputs usable for flip-flop access)
	// - two 5-input functions that share two inputs
	// - a 5-input and a 4-input function that share one input
	// - a 5-input and a 3-or-less-input function that share no inputs
	// - two 4-or-less-input functions that share no inputs
	//
	// Normal-mode functions are represented as MISTRAL_ALUTN cells with N inputs.
	// It would be possible to represent a normal mode function as a single cell -
	// the vendor cyclone{v,10gx}_lcell_comb cell does exactly that - but I felt
	// it was more user-friendly to print out the specific function sizes
	// separately.
	//
	// With the exception of MISTRAL_ALUT6, you can think of two normal-mode cells
	// fitting inside a single ALM.
	//
	// Extended (7-input) mode
	// -----------------------
	// The ALM can also fit a 7-input function made of two 5-input functions that
	// share four inputs, multiplexed by another input.
	//
	// Because this can't accept arbitrary 7-input functions, Yosys can't handle
	// it, so it doesn't have a cell, but I would likely call it MISTRAL_ALUT7(E?)
	// if it did, and it would take up a full ALM.
	//
	// It might be possible to add an extraction pass to examine all ALUT5 cells
	// that feed into ALUT3 cells to see if they can be combined into an extended
	// ALM, but I don't think it will be worth it.
	//
	// Arithmetic mode
	// ---------------
	// In arithmetic mode, each half-ALM uses its carry chain to perform fast addition
	// of two four-input functions that share three inputs. Oddly, the result of
	// one of the functions is inverted before being added (you can see this as
	// the dot on a full-adder input of Figure 1-8 in the Handbook).
	//
	// The cell for an arithmetic-mode half-ALM is MISTRAL_ALM_ARITH. One idea
	// I've had (or rather was suggested by mwk) is that functions that feed into
	// arithmetic-mode cells could be packed directly into the arithmetic-mode
	// cell as a function, which reduces the number of ALMs needed.
	//
	// Shared arithmetic mode
	// ----------------------
	// Shared arithmetic mode looks a lot like arithmetic mode, but here the
	// output of every other four-input function goes to the input of the adder
	// the next bit along. What this means is that adding three bits together can
	// be done in an ALM, because functions can be used to implement addition that
	// then feeds into the carry chain. This means that three bits can be added per
	// ALM, as opposed to two in the arithmetic mode.
	//
	// Shared arithmetic mode doesn't currently have a cell, but I intend to add
	// it as MISTRAL_ALM_SHARED, and have it occupy a full ALM. Because it adds
	// three bits per cell, it makes addition shorter and use less ALMs, but
	// I don't know enough to tell whether it's more efficient to use shared
	// arithmetic mode to shorten the carry chain, or plain arithmetic mode with
	// the functions packed in.
	// The four D flip-flops (DFFs) in a Cyclone V/10GX Adaptive Logic Module (ALM)
	// act as one-bit memory cells that can be placed very flexibly (wherever there's
	// an ALM); each flop is represented by a MISTRAL_FF cell.
	//
	// The flops in these chips are rather flexible in some ways, but in practice
	// quite crippled by FPGA standards.
	//
	// What the flops can do
	// ---------------------
	// The core flop acts as a single-bit memory that initialises to zero at chip
	// reset. It takes in data on the rising edge of CLK if ENA is high,
	// and outputs it to Q. The ENA (clock enable) pin can therefore be used to
	// capture the input only if a condition is true.
	//
	// The data itself is zero if SCLR (synchronous clear) is high, else it comes
	// from SDATA (synchronous data) if SLOAD (synchronous load) is high, or DATAIN
	// if SLOAD is low.
	//
	// If ACLR (asynchronous clear) is low then Q is forced to zero, regardless of
	// the synchronous inputs or CLK edge. This is most often used for an FPGA-wide
	// power-on reset.
	//
	// An asynchronous set that sets Q to one can be emulated by inverting the input
	// and output of the flop, resulting in ACLR forcing Q to zero, which then gets
	// inverted to produce one. Likewise, logic can operate on the falling edge of
	// CLK if CLK is inverted before being passed as an input.
	//
	// What the flops can't do
	// -------------------------
	// The trickiest part of the above capabilities is the lack of configurable
	// initialisation state. For example, it isn't possible to implement a flop with
	// asynchronous clear that initialises to one, because the hardware initialises
	// to zero. Likewise, you can't emulate a flop with asynchronous set that
	// initialises to zero, because the inverters mean the flop initialises to one.
	//
	// If the input design requires one of these cells (which appears to be rare
	// in practice) then synth_intel_alm will fail to synthesize the design where
	// other Yosys synthesis scripts might succeed.
	//
	// This stands in notable contrast to e.g. Xilinx flip-flops, which have
	// configurable initialisation state and native synchronous/asynchronous
	// set/clear (although not at the same time), which means they can generally
	// implement a much wider variety of logic.
	// The MLAB
	// --------
	// In addition to Logic Array Blocks (LABs) that contain ten Adaptive Logic
	// Modules (ALMs, see alm_sim.v), the Cyclone V/10GX also contain
	// Memory/Logic Array Blocks (MLABs) that can act as either ten ALMs, or utilise
	// the memory the ALM uses to store the look-up table data for general usage,
	// producing a 32 address by 20-bit block of memory. MLABs are spread out
	// around the chip, so they can be placed near where they are needed, rather than
	// being comparatively limited in placement for a deep but narrow memory such as
	// the M10K memory block.
	//
	// MLABs are used mainly for shallow but wide memories, such as CPU register
	// files (which have perhaps 32 registers that are comparatively wide (16/32-bit))
	// or shift registers (by using the output of the Nth bit as input for the N+1th
	// bit).
	//
	// Oddly, instead of providing a block 32 address by 20-bit cell, Quartus asks
	// synthesis tools to build MLABs out of 32 address by 1-bit cells, and tries
	// to put these cells in the same MLAB during cell placement. Because of this
	// a MISTRAL_MLAB cell represents one of these 32 address by 1-bit cells, and
	// 20 of them represent a physical MLAB.
	//
	// How the MLAB works
	// ------------------
	// MLABs are poorly documented, so the following information is based mainly
	// on the simulation model and my knowledge of how memories like these work.
	// Additionally, note that the ports of MISTRAL_MLAB are the ones auto-generated
	// by the Yosys `memory_bram` pass, and it doesn't make sense to me to use
	// `techmap` just for the sake of renaming the cell ports.
	//
	// The MLAB can be initialised to any value, but unfortunately Quartus only
	// allows memory initialisation from a file. Since Yosys doesn't preserve input
	// file information, or write the contents of an `initial` block to a file,
	// Yosys can't currently initialise the MLAB in a way Quartus will accept.
	//
	// The MLAB takes in data from A1DATA at the rising edge of CLK1, and if A1EN
	// is high, writes it to the address in A1ADDR. A1EN can therefore be used to
	// conditionally write data to the MLAB.
	//
	// Simultaneously, the MLAB reads data from B1ADDR, and outputs it to B1DATA,
	// asynchronous to CLK1 and ignoring A1EN. If a synchronous read is needed
	// then the output can be fed to embedded flops. Presently, Yosys assumes
	// Quartus will pack external flops into the MLAB, but this is an assumption
	// that needs testing.
	//
	// If data is simultaneously read from and written to an address, the read
	// data is undefined, so write-to-read data pass-through using multiplexers
	// is implemented by Yosys if needed.