Skip to content

Instantly share code, notes, and snippets.

@soronpo
Last active September 29, 2021 12:49
Show Gist options
  • Save soronpo/056bbc7e6a90b1629b6756ab1e690539 to your computer and use it in GitHub Desktop.
Save soronpo/056bbc7e6a90b1629b6756ab1e690539 to your computer and use it in GitHub Desktop.
//This is a DFiant-equivalent design of https://github.com/tommythorn/silice-examples/blob/master/cpu.ice
import DFiant.*
class CPU(using DFC) extends DFDesign:
val leds = DFBits(8) <> OUT
val rf = DFBits(32).X(16) <> VAR init Vector(
0, 1, 1, 0, 100, 0, 1
).padTo(16, 0).map(_.toBits(32))
// A add, B blt
val code = DFBits(32).X(32) const Vector(
h"32'0A556", // r5, r5, r6
h"32'0A312", // r3 = r1 + r2
h"32'0A120", // r1 = r2
h"32'0A230", // r2 = r3
h"32'0B034" // if r3 < r4: pc = 0
).padTo(32, h"32'0")
object Insn extends DFFields:
val brtarget = DFUInt(16) <> FIELD
val opcode = DFBits(4) <> FIELD
val rd, rs, rt = DFBits(4) <> FIELD
val pc = DFUInt(32) <> VAR init 0
val insn = Insn <> VAR //bubble init
val wb_addr = DFBits(4) <> VAR //bubble init
val wb_data = DFBits(32) <> VAR //bubble init
insn := code(pc.bits(4, 0)).pipe.as(Insn)
val rs_data = rf(insn.rs).pipe
val rt_data = rf(insn.rt).pipe
val rs_data_fw = if (insn.rs == wb_addr && wb_data.isValid) wb_data else rs_data
val rt_data_fw = if (insn.rt == wb_addr && wb_data.isValid) wb_data else rt_data
if (insn.opcode == h"A" && insn.rd != h"0") wb_addr := insn.rd
else wb_addr := ?
wb_data := rs_data_fw.uint + rt_data_fw.uint
rf(wb_addr) := wb_data
if (insn.opcode == h"B" && rs_data_fw.uint < rt_data_fw.uint && insn.brtarget.isValid)
pc := insn.brtarget
//flush by forcing bubbles in the pipeline
insn := ?
wb_data := ?
wb_addr := ?
else pc := pc + 1
if (sim.inSimulation)
val cycle = DFUInt(32) <> VAR init 0
if (cycle >= 80) sim.finish()
cycle := cycle + 1
if (wb_addr.isValid)
sim.report(msg"$cycle WB $pc:$insn $rs_data_fw,$rt_data_fw $wb_data -> r$wb_addr")
else
sim.report(msg"$cycle WB $pc:$insn $rs_data_fw,$rt_data_fw")
end CPU
@soronpo
Copy link
Author

soronpo commented Sep 13, 2021

A few notes:

  • DFiant is strongly typed and differentiates between a DFBits (bit aggregation) and a DFUInt (unsigned arithmetic). Conversion between the two is fairly simple. So the program counter is typically a DFUInt, while a register in the register file is a DFBits.
  • Vectors (arrays) are created by a composition of a basic type and a dimension. E.g. DFBits(32).X(16) is a 16-element vector of 32-bit dataflow values.
  • The original Silice design contained 32 registers, but the address space for registers only accommodates 16 registers, so the DFiant design just uses 16 registers.
  • Here we utilize the struct fields capability to decode the instruction.
  • DFiant has a dataflow abstraction, and therefore all its dataflow values have an implicit ready-valid signaling if required. An uninitialized dataflow value contains a (stall) bubble that prevents an illegal value from being committed to a state. This is done implicitly. By assigning ? to a dataflow variable, we define it as a bubble. Any logical/arithmetic operation with a bubble will always yields a bubble, unless a condition with isValid is used to guard an if-statement.
  • This design clearly separates the simulation aspect from the design itself. For synthesis, the simulation code is dropped from the generated RTL.
  • DFiant automatically balances pipelined value access, so no values from various stages are accessed without balancing.
  • pc := pc + 1 is equivalent to pc := pc.prev + 1, since pc was not assigned earlier in the scope with another value so the previous dataflow value is obtained and creates an incremental accumulation.

@tommythorn
Copy link

tommythorn commented Sep 14, 2021

Thank you for this example.

The implicit valid/ready everywhere is convenient and your pipeline abstraction is definitely an improvement over Chisel. How well this is optimized would be my first question.

I think

DFiant automatically balances pipelined value access, so no values from various stages are accessed without balancing.

is what had me confused at first. For example in line 40 insn comes from several stages up (I'd have to count them as it's not obvious) is combined (indirectly) with rs_data which is a different distance. I can see how this can work, but it certainly feels a bit magic and the effective pipeline depth isn't explict.

@soronpo
Copy link
Author

soronpo commented Sep 14, 2021

How well this is optimized would be my first question.

That's a good question, but currently I have no answer since there is some work to be done. The goal is to only generate the handshaking signals if they are required. The generated code is not only optimized for logic (and possible performance characteristics), but also for readability.

is what had me confused at first. For example in line 40 insn comes from several stages up (I'd have to count them as it's not obvious) is combined (indirectly) with rs_data which is a different distance. I can see how this can work, but it certainly feels a bit magic and the effective pipeline depth isn't explict.

What's cool about DFiant is that pipelining is just a constraint. The compiler can automatically add pipeline stages if I tell it to. Balancing is different, since DFiant just keeps your code correct if you add .pipe tags and forget to balance it yourself. Another cool thing is that you can printout the code after the balancing stage. So if I have a code like (x - x.prev) * (y - y.prev).pipe and if x and y come from the same path, then the implicit join of the arithmetic operation forced the compiler to balance the pipe to maintain correctness. When you print the code after the balancing stage it would look like (x - x.prev).pipe * (y - y.prev).pipe. Notice that there is a difference between a .prev and .pipe. .prev is part of that function, whereas .pipe is just a constraint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment