-
-
Save soronpo/056bbc7e6a90b1629b6756ab1e690539 to your computer and use it in GitHub Desktop.
//This is a DFiant-equivalent design of https://github.com/tommythorn/silice-examples/blob/master/cpu.ice | |
import DFiant.* | |
class CPU(using DFC) extends DFDesign: | |
val leds = DFBits(8) <> OUT | |
val rf = DFBits(32).X(16) <> VAR init Vector( | |
0, 1, 1, 0, 100, 0, 1 | |
).padTo(16, 0).map(_.toBits(32)) | |
// A add, B blt | |
val code = DFBits(32).X(32) const Vector( | |
h"32'0A556", // r5, r5, r6 | |
h"32'0A312", // r3 = r1 + r2 | |
h"32'0A120", // r1 = r2 | |
h"32'0A230", // r2 = r3 | |
h"32'0B034" // if r3 < r4: pc = 0 | |
).padTo(32, h"32'0") | |
object Insn extends DFFields: | |
val brtarget = DFUInt(16) <> FIELD | |
val opcode = DFBits(4) <> FIELD | |
val rd, rs, rt = DFBits(4) <> FIELD | |
val pc = DFUInt(32) <> VAR init 0 | |
val insn = Insn <> VAR //bubble init | |
val wb_addr = DFBits(4) <> VAR //bubble init | |
val wb_data = DFBits(32) <> VAR //bubble init | |
insn := code(pc.bits(4, 0)).pipe.as(Insn) | |
val rs_data = rf(insn.rs).pipe | |
val rt_data = rf(insn.rt).pipe | |
val rs_data_fw = if (insn.rs == wb_addr && wb_data.isValid) wb_data else rs_data | |
val rt_data_fw = if (insn.rt == wb_addr && wb_data.isValid) wb_data else rt_data | |
if (insn.opcode == h"A" && insn.rd != h"0") wb_addr := insn.rd | |
else wb_addr := ? | |
wb_data := rs_data_fw.uint + rt_data_fw.uint | |
rf(wb_addr) := wb_data | |
if (insn.opcode == h"B" && rs_data_fw.uint < rt_data_fw.uint && insn.brtarget.isValid) | |
pc := insn.brtarget | |
//flush by forcing bubbles in the pipeline | |
insn := ? | |
wb_data := ? | |
wb_addr := ? | |
else pc := pc + 1 | |
if (sim.inSimulation) | |
val cycle = DFUInt(32) <> VAR init 0 | |
if (cycle >= 80) sim.finish() | |
cycle := cycle + 1 | |
if (wb_addr.isValid) | |
sim.report(msg"$cycle WB $pc:$insn $rs_data_fw,$rt_data_fw $wb_data -> r$wb_addr") | |
else | |
sim.report(msg"$cycle WB $pc:$insn $rs_data_fw,$rt_data_fw") | |
end CPU |
How well this is optimized would be my first question.
That's a good question, but currently I have no answer since there is some work to be done. The goal is to only generate the handshaking signals if they are required. The generated code is not only optimized for logic (and possible performance characteristics), but also for readability.
is what had me confused at first. For example in line 40 insn comes from several stages up (I'd have to count them as it's not obvious) is combined (indirectly) with rs_data which is a different distance. I can see how this can work, but it certainly feels a bit magic and the effective pipeline depth isn't explict.
What's cool about DFiant is that pipelining is just a constraint. The compiler can automatically add pipeline stages if I tell it to. Balancing is different, since DFiant just keeps your code correct if you add .pipe
tags and forget to balance it yourself. Another cool thing is that you can printout the code after the balancing stage. So if I have a code like (x - x.prev) * (y - y.prev).pipe
and if x
and y
come from the same path, then the implicit join of the arithmetic operation forced the compiler to balance the pipe to maintain correctness. When you print the code after the balancing stage it would look like (x - x.prev).pipe * (y - y.prev).pipe
. Notice that there is a difference between a .prev
and .pipe
. .prev
is part of that function, whereas .pipe
is just a constraint.
Thank you for this example.
The implicit valid/ready everywhere is convenient and your pipeline abstraction is definitely an improvement over Chisel. How well this is optimized would be my first question.
I think
is what had me confused at first. For example in line 40 insn comes from several stages up (I'd have to count them as it's not obvious) is combined (indirectly) with rs_data which is a different distance. I can see how this can work, but it certainly feels a bit magic and the effective pipeline depth isn't explict.