Skip to content

Instantly share code, notes, and snippets.

@lovely-error
Last active September 22, 2025 09:43
Show Gist options
  • Save lovely-error/5132e8b8f2f7aa22ff3f12dea1553bfe to your computer and use it in GitHub Desktop.
Save lovely-error/5132e8b8f2f7aa22ff3f12dea1553bfe to your computer and use it in GitHub Desktop.

Dataflow description language

  1. no global mem, every task and process can only access its local state
  2. communication happen through pipes
    1. when targeting fpgas its size is the size of a single block ram
    2. customisation of lowering is possible in a separate constraint file
  3. autoclocking
    1. we want to clock proc fsms at higher freqs to not keep seqvs starved most of the time
    2. still can be clocked with one main clock and each item has internal clock enable counter (???)

Pieces

basic items

  1. process
    1. may contain loops
    2. may contain state
    3. can be lowered to a state machine (never pipeline)
    4. multiport memory can be impled as 1R1W
    5. may stop (reach terminal state)
    6. must not be nested
    7. may contain div
    8. may contain operations with unpredictable latency
  2. sequence
    1. must not contain loops/state
      1. "inplace" mutatation can be expressed as comb logic and thus not considered state
    2. must not contain TDM memory
    3. must not contain div with runtime divisor
    4. can be lowered to (auto) pipeline (head-fsm-tail-pipeline or just pipeline)
      1. should only perform auto pipelining on builtins with predictable latency (mul, simd ops) (no dynamic div!)
    5. head gathers inputs (first stage, blocking reads), tail computes with it (nonblocking reads are allowed)
      1. head pushes data to tail only when every sink has a slot for push
    6. used to express pipelineable logic
      1. either explicitly by ||| stage separators
        1. this shouldnt exist. pipelining should be automatic and timing driven.
          1. idk if autopiping is doable even via hacks to vendor synth tools
          2. yosys doesnt provide timing info?
      2. or implicitly for @maps or muls
    7. blocking reads from pipes can only be in first stage
    8. pipeline fires when all buffer sinks have slots (in presense of blocking sends)
    9. is_valid bit at each stage
  3. function
    1. parameters can be either
      1. direct (by value) T
      2. indirect (by reference)
        1. out T "write only"
        2. inout T "read write"
      3. name(arg1:T, arg2:out T, arg3:inout T)
    2. must express only combinational logic
    3. can be used in both sequencees and processs
  4. clock
    1. globaly defined statements for io procs
    2. clock clk_name: 133Mhz
    3. @wait_cycles(n) waits n cycles in io process
    4. @switch_to(clk_iden) switches to a nother clock at runtime
    5. this construct makes clock derivation less problematic
  5. for in
    1. for i in 0..n
      1. n may be dynamic in processes
      2. n must be static in sequences
    2. for k in array_ref
      1. iterate over items in array
  6. pin
    1. chip IO stuff
    2. default must be specified for out and inout
    3. pin in pin out pin inout
    4. @read_pin
    5. @write_pin
  7. io process
    1. user defined period
    2. loop gets invoked at every specified kth time point
    3. runtime switchable poll rate
      1. allows to negotiate faster rate on links
  8. bridge process
    1. connection to verilog modules
  9. struct
  10. the structs
  11. union
  12. the unions
  13. enum
  14. enums
  15. graph
  16. specify connectivity between seqvs and processes
  17. arbitrary connections, cycles are ok

backpressure

  1. procs (fsm) can wait arbitrarily long until slot is available in buffer sinks
  2. seqvs can wait arbitrary long until all buffer sinks with blocking pushes have space (reduces runrate)

types

  1. iN
    1. arbitrary width unsigned integer
    2. signed ints are in two complement format
    3. same as [i1;N]
    4. shr, shl, plus, neg, et c.
  2. [T;n]
    1. arrays of length n of T items
    2. @map(item, fun_ref) enables simd operations
    3. annotations #[impl(...)] to request particular impl
      1. can be either lutram bram bkram
      2. lutram ram by registers
      3. bram ram by block ram
      4. bkram should synthesise as banked ram (one bram = one bank) with conflict minimisation
      5. seqvs must not block on mem accs so lowering is dependent on infered port number
  3. pipes
    1. monodirectional fifo
    2. single producer & single consumer
    3. guaranteed reads and test reads
    4. any pipe is either buffer or stream
      1. buffer
        1. producer stalls when no slots available
        2. @try_rcv -> (T, i1) , if data item present, consume it
        3. @rcv -> T , blocking read
          1. @send blocking send
      2. stream
        1. old values get dropped on overfill
        2. @try_rcv -> (T, i1) , if data item present, consume it
        3. @rcv -> T , blocking read
        4. @send non blocking send
    5. <X> in T read only pipe of Ts (X can be either buffer or stream)
    6. <X> out T write only pipe of Ts (X can be either buffer or stream)
    7. used to connect sequences and processies
    8. @mk_pipe(capacity) used for creation of both buffers and streams

unresolved issues

  1. how do we clocks graphs?
    1. non io entities are all clocked from the same source (fmax?)
    2. doesnt seem resonable to do different clocking for main logic, look like its a must for io procs
  2. how do we lower io procs?
    1. is serdes vs non serdes different?
    2. i want them to be able to switch poll rate to allow for negotiation of transfer rates
  3. how should we do cdc for io procs ?

problematic cases

  1. theres a psram block in gw1nr. it is posible to make MEM2RW and make burst loads from psram on one port, and have another for other tasks. it is unclear how to do same thing in ddl.
    1. io proc as a psram controller ; two stream pipes of depth one, one for sending commands and another for sending back data ; few clock cycles longer than impl in verilog

examples

nesting by indentation instead of brackets

process STM (arg1: i1) -- only direct parameters, () may be omited
   mut state: MyEnum
   mut mem: [i1;32]
   let some_const: i1

   init 
      -- init exprs go into init block
      -- it will be called only once
      state = MyEnum::Uninit
      mem = @zeroed()
      some_const = 0

   loop label
      match state
         Pattern1 =>
               continue label
      Pattern2 => -- match also accepts nonincreasing indentation
            -- some_action
            break label
        

-- only direct parameters, must contain parameters
sequence Exmpl (arg1: buffer in i1, arg1: buffer in i4, arg2: buffer out i4, arg3: buffer out i8)

   -- stage 1
   let val1 = @rcv(arg1) -- only first stage can contain blocking reads

   |||

   -- stage 2
   let (val2, is_valid) = @try_rcv(arg2) -- can only contain nonblocking reads

   |||

   -- stage 3
   let res: i4 = val1 * val2 -- multiplication may extend the pipeline
   @send(arg3, res) -- blocking send. the pipeline head should have checked if sink has a free slot, so this cannot be blocking


function name (arg1: i1, arg2: inout [i8;8])
   arg2[0] += arg1 -- this will be lowered differently based on whether it is used in process or sequence


pin out led_enable: i1 = 0
pin in data_pin: i1
clock ex1 = 12*10**6

io process LedBlinker (arg1: stream out i1)

   mut counter: u32

   init
      counter = 12*10**6
      @switch_to(ex1)

   loop
      counter -= 1
      if counter == 0
      then
         counter = 12*10**6
         led_enable = !led_enable
         let smth = @read_pin(data_pin)
         @send(arg1, smth) -- non blocking send, because sink is a stream
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment