Last active
April 24, 2020 15:03
-
-
Save Ravenslofty/fa683374613beca611e6818cd6d6a5ec to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// The core logic primitive of the Cyclone V/10GX is the Adaptive Logic Module | |
// (ALM). Each ALM is made up of an 8-input, 2-output look-up table, covered | |
// in this file, connected to combinational outputs, a carry chain, and four | |
// D flip-flops (which are covered as MISTRAL_FF in mem_sim.v). | |
// | |
// The ALM is vertically symmetric, so I find it helps to think in terms of | |
// half-ALMs, as that's predominantly the unit that synth_intel_alm uses. | |
// | |
// ALMs are quite flexible, having multiple modes. | |
// | |
// Normal (combinational) mode | |
// --------------------------- | |
// The ALM can implement: | |
// - a single 6-input function (with the other inputs usable for flip-flop access) | |
// - two 5-input functions that share two inputs | |
// - a 5-input and a 4-input function that share one input | |
// - a 5-input and a 3-or-less-input function that share no inputs | |
// - two 4-or-less-input functions that share no inputs | |
// | |
// Normal-mode functions are represented as MISTRAL_ALUTN cells with N inputs. | |
// It would be possible to represent a normal mode function as a single cell - | |
// the vendor cyclone{v,10gx}_lcell_comb cell does exactly that - but I felt | |
// it was more user-friendly to print out the specific function sizes | |
// separately. | |
// | |
// With the exception of MISTRAL_ALUT6, you can think of two normal-mode cells | |
// fitting inside a single ALM. | |
// | |
// Extended (7-input) mode | |
// ----------------------- | |
// The ALM can also fit a 7-input function made of two 5-input functions that | |
// share four inputs, multiplexed by another input. | |
// | |
// Because this can't accept arbitrary 7-input functions, Yosys can't handle | |
// it, so it doesn't have a cell, but I would likely call it MISTRAL_ALUT7(E?) | |
// if it did, and it would take up a full ALM. | |
// | |
// It might be possible to add an extraction pass to examine all ALUT5 cells | |
// that feed into ALUT3 cells to see if they can be combined into an extended | |
// ALM, but I don't think it will be worth it. | |
// | |
// Arithmetic mode | |
// --------------- | |
// In arithmetic mode, each half-ALM uses its carry chain to perform fast addition | |
// of two four-input functions that share three inputs. Oddly, the result of | |
// one of the functions is inverted before being added (you can see this as | |
// the dot on a full-adder input of Figure 1-8 in the Handbook). | |
// | |
// The cell for an arithmetic-mode half-ALM is MISTRAL_ALM_ARITH. One idea | |
// I've had (or rather was suggested by mwk) is that functions that feed into | |
// arithmetic-mode cells could be packed directly into the arithmetic-mode | |
// cell as a function, which reduces the number of ALMs needed. | |
// | |
// Shared arithmetic mode | |
// ---------------------- | |
// Shared arithmetic mode looks a lot like arithmetic mode, but here the | |
// output of every other four-input function goes to the input of the adder | |
// the next bit along. What this means is that adding three bits together can | |
// be done in an ALM, because functions can be used to implement addition that | |
// then feeds into the carry chain. This means that three bits can be added per | |
// ALM, as opposed to two in the arithmetic mode. | |
// | |
// Shared arithmetic mode doesn't currently have a cell, but I intend to add | |
// it as MISTRAL_ALM_SHARED, and have it occupy a full ALM. Because it adds | |
// three bits per cell, it makes addition shorter and use less ALMs, but | |
// I don't know enough to tell whether it's more efficient to use shared | |
// arithmetic mode to shorten the carry chain, or plain arithmetic mode with | |
// the functions packed in. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// The four D flip-flops (DFFs) in a Cyclone V/10GX Adaptive Logic Module (ALM) | |
// act as one-bit memory cells that can be placed very flexibly (wherever there's | |
// an ALM); each flop is represented by a MISTRAL_FF cell. | |
// | |
// The flops in these chips are rather flexible in some ways, but in practice | |
// quite crippled by FPGA standards. | |
// | |
// What the flops can do | |
// --------------------- | |
// The core flop acts as a single-bit memory that initialises to zero at chip | |
// reset. It takes in data on the rising edge of CLK if ENA is high, | |
// and outputs it to Q. The ENA (clock enable) pin can therefore be used to | |
// capture the input only if a condition is true. | |
// | |
// The data itself is zero if SCLR (synchronous clear) is high, else it comes | |
// from SDATA (synchronous data) if SLOAD (synchronous load) is high, or DATAIN | |
// if SLOAD is low. | |
// | |
// If ACLR (asynchronous clear) is low then Q is forced to zero, regardless of | |
// the synchronous inputs or CLK edge. This is most often used for an FPGA-wide | |
// power-on reset. | |
// | |
// An asynchronous set that sets Q to one can be emulated by inverting the input | |
// and output of the flop, resulting in ACLR forcing Q to zero, which then gets | |
// inverted to produce one. Likewise, logic can operate on the falling edge of | |
// CLK if CLK is inverted before being passed as an input. | |
// | |
// What the flops *can't* do | |
// ------------------------- | |
// The trickiest part of the above capabilities is the lack of configurable | |
// initialisation state. For example, it isn't possible to implement a flop with | |
// asynchronous clear that initialises to one, because the hardware initialises | |
// to zero. Likewise, you can't emulate a flop with asynchronous set that | |
// initialises to zero, because the inverters mean the flop initialises to one. | |
// | |
// If the input design requires one of these cells (which appears to be rare | |
// in practice) then synth_intel_alm will fail to synthesize the design where | |
// other Yosys synthesis scripts might succeed. | |
// | |
// This stands in notable contrast to e.g. Xilinx flip-flops, which have | |
// configurable initialisation state and native synchronous/asynchronous | |
// set/clear (although not at the same time), which means they can generally | |
// implement a much wider variety of logic. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// The MLAB | |
// -------- | |
// In addition to Logic Array Blocks (LABs) that contain ten Adaptive Logic | |
// Modules (ALMs, see alm_sim.v), the Cyclone V/10GX also contain | |
// Memory/Logic Array Blocks (MLABs) that can act as either ten ALMs, or utilise | |
// the memory the ALM uses to store the look-up table data for general usage, | |
// producing a 32 address by 20-bit block of memory. MLABs are spread out | |
// around the chip, so they can be placed near where they are needed, rather than | |
// being comparatively limited in placement for a deep but narrow memory such as | |
// the M10K memory block. | |
// | |
// MLABs are used mainly for shallow but wide memories, such as CPU register | |
// files (which have perhaps 32 registers that are comparatively wide (16/32-bit)) | |
// or shift registers (by using the output of the Nth bit as input for the N+1th | |
// bit). | |
// | |
// Oddly, instead of providing a block 32 address by 20-bit cell, Quartus asks | |
// synthesis tools to build MLABs out of 32 address by 1-bit cells, and tries | |
// to put these cells in the same MLAB during cell placement. Because of this | |
// a MISTRAL_MLAB cell represents one of these 32 address by 1-bit cells, and | |
// 20 of them represent a physical MLAB. | |
// | |
// How the MLAB works | |
// ------------------ | |
// MLABs are poorly documented, so the following information is based mainly | |
// on the simulation model and my knowledge of how memories like these work. | |
// Additionally, note that the ports of MISTRAL_MLAB are the ones auto-generated | |
// by the Yosys `memory_bram` pass, and it doesn't make sense to me to use | |
// `techmap` just for the sake of renaming the cell ports. | |
// | |
// The MLAB can be initialised to any value, but unfortunately Quartus only | |
// allows memory initialisation from a file. Since Yosys doesn't preserve input | |
// file information, or write the contents of an `initial` block to a file, | |
// Yosys can't currently initialise the MLAB in a way Quartus will accept. | |
// | |
// The MLAB takes in data from A1DATA at the rising edge of CLK1, and if A1EN | |
// is high, writes it to the address in A1ADDR. A1EN can therefore be used to | |
// conditionally write data to the MLAB. | |
// | |
// Simultaneously, the MLAB reads data from B1ADDR, and outputs it to B1DATA, | |
// asynchronous to CLK1 and ignoring A1EN. If a synchronous read is needed | |
// then the output can be fed to embedded flops. Presently, Yosys assumes | |
// Quartus will pack external flops into the MLAB, but this is an assumption | |
// that needs testing. | |
// | |
// If data is simultaneously read from and written to an address, the read | |
// data is undefined, so write-to-read data pass-through using multiplexers | |
// is implemented by Yosys if needed. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment