Creative Commons Attribution 4.0 International License .
Clock domain crossing (CDC) is a well understood topic, which has been documented enough (e.g. [Golson2014]) and has wide support in EDA tools. Many designers still get it wrong (or at least poorly) done.
CDC spans many areas and this gist covers one that is often forgotten, maybe due to the fact it is rarely
responsibility of RTL designers: Timing constraints. Too often I have seen in practice that CDC paths get
simply ignored in SDCs, either through explicit set_false_path
or through set_clock_groups -asynchronous
(see Discouraged SDC Techniques).
While this can be accepted in cases where there was a diligent analysis on applied CDC mechanisms, it is
always far better to spend the time devising the correct CDC timing constraints. One reason is better guidance
to synthesis and P&R tools (e.g. to place CDC flops close to each other). The other reason is making STA
coverage more complete; instead of trurning a blind eye through false paths, STA can legally time the paths
and make sure all timing assumptions do indeed hold.
The point of Encouraged SDC Techniques is to:
- Minimize load on CDC flops. This will make synthesis and P&R tools choose fast flops and place them close to each other (i.e. short wire, less capacitance).
- Minimize delay between CDC flops. This is to prevent physical implementation tools inserting extra buffers on the paths between CDC flops.
There are only two CDC path types: An individual, single-bit path, and a multi-bit path. There is no difference whether the latter is a parallel bus or a set of individual signals that need to be kept in phase. Single-bit paths are solved using synchronizers. Multi-bit paths use some kind of signaling between the clock domains to indicate stability of the path in the source domain.
Single-bit 2FF synchronizer handles single-bit CDC paths.
Single bit 2FF synchronizer [Golson2014].
Multi-bit path with stability guard (a.k.a. handshake mechanism) passes a multi-bit path directly from the source domain to the receiving domain, where the reception is conditioned/guarded by stability of the logic state in the source domain. The conditioning is achieved by an associated signaling/handshake mechanism, which uses a single-bit synchronizer in both the forward and reverse direction. For examples of some handshake schemes see [sv_hadskake_comps].
Multi bit synchronization scheme [Golson2014].
CDC timing constraints target primarily the single-bit paths and hence 2FF synchronizers. Their intent is to make sure the paths between individual FF stages are short and minimally loaded. In multi-bit paths, the singling mechanism is merely a use of two synchronizers, one in each direction; the parallel data path is then only constrained for being fast enough.
It is quite common to see CDC paths being (intentionally) ignored, either through set_false_path
or
set_clock_groups -asynchronous
(without -alow_paths
). In either case, the paths become invisible to the tools. No one then
really knows how timing looks like on those paths and the implementation tools have no incentives to
minimize their load and delay.
# use of `set_false_path` # use of `set_clock_groups -asynchronous` # ----------------------- # ----------------------------------------- pt_shell> create_clock -name CLKA -period 10 clkA pt_shell> create_clock -name CLKA -period 10 clkA pt_shell> create_clock -name CLKB -period 10 clkB pt_shell> create_clock -name CLKB -period 10 clkB pt_shell> set_false_path -from CLKA -to CLKB pt_shell> set_clock_groups -name grp -group CLKA -group CLKB -asynchronous pt_shell> set_false_path -from CLKB -to CLKA pt_shell> report_timing -from cdc_rdy/src -to cdc_rdy/st0 pt_shell> report_timing -from cdc_rdy/src -to cdc_rdy/st0 No constrained paths. No constrained paths.
There is no difference in the end effect of the two methods. Using set_clock_groups
more clearly expresses the intent and avoids the need of specify two exceptions (one in
each direction) between all pairs of clock domains.
Another option is to relax CDC paths timing through set_max_delay infinity
. This time, the paths
remain visible to the tools but remove all incentives for their optimization.
pt_shell> create_clock -name CLKA -period 10 clkA pt_shell> create_clock -name CLKB -period 10 clkB pt_shell> set_max_delay infinity -from CLKA -to CLKB pt_shell> report_timing -from cdc_rdy/src -to cdc_rdy/st0 Startpoint: cdc_rdy/src (rising edge-triggered flip-flop clocked by CLKA) Endpoint: cdc_rdy/st0 (rising edge-triggered flip-flop clocked by CLKB) Path Group: CLKB Path Type: max Point Incr Path --------------------------------------------------------------- cdc_rdy/src/CK (dffrx1) 0.00 0.00 r cdc_rdy/src/Q (dffrx1) <- 3.00 3.00 r cdc_rdy/st0/D (dffrx1) 0.00 3.00 r data arrival time 3.00 max_delay 0.inf 0.inf clock reconvergence pessimism 0.00 0.inf library setup time -0.70 0.inf data required time 0.inf --------------------------------------------------------------- data required time 0.inf data arrival time -3.00 --------------------------------------------------------------- slack (MET) 0.inf
Note
The set_max_delay
relaxes timing only for the setup/max checks and leaves
the hold/min checks unaffected. This may often get unnoticed if the hold timing is
not violated. When it does cause timing violations, these are usually false and
would call for using set_min_delay
. See Encouraged SDC Techniques for more
details.
Avoiding using path exceptions (like above) may then seem to some as far better approach. Some may even make CDC clocks intentionally "synchronous". This again helps to make the paths visible to the tools. It also sets some incentives too the tools; however, these incentives may not be the best ones.
For example, this leaves a full clkB
period for timing between st0
and st1
flops of the 2FF
synchronizer. This would allow impl. tools to place them further apart and maybe even put buffers on
the path. It may also not represent the worst case timing conditions between the two clock domains.
Another problem may be a difference in clock latency between the source and receiving domains. It may cause too pessimistic timing for either setup or hold checks, leading to either timing violations or attempts to fix it by buffering the path.
pt_shell> create_clock -name CLKA -period 10 clkA pt_shell> create_clock -name CLKB -period 10 clkB pt_shell> set_clock_latency 8 CLKA pt_shell> report_timing -from cdc_rdy/src -to cdc_rdy/st0 Startpoint: cdc_rdy/src (rising edge-triggered flip-flop clocked by CLKA) Endpoint: cdc_rdy/st0 (rising edge-triggered flip-flop clocked by CLKB) Path Group: CLKB Path Type: max Point Incr Path --------------------------------------------------------------- clock CLKA (rise edge) 0.00 0.00 clock network delay (ideal) 8.00 8.00 cdc_rdy/src/CK (dffrx1) 0.00 8.00 r cdc_rdy/src/Q (dffrx1) <- 3.00 11.00 r cdc_rdy/st0/D (dffrx1) 0.00 11.00 r data arrival time 11.00 clock CLKB (rise edge) 10.00 10.00 clock network delay (ideal) 0.00 10.00 clock reconvergence pessimism 0.00 10.00 cdc_rdy/st0/CK (dffrx1) 10.00 r library setup time -0.70 9.30 data required time 9.30 --------------------------------------------------------------- data required time 9.30 data arrival time -11.00 --------------------------------------------------------------- slack (VIOLATED) -1.70
The encouraged method is to keep the CDC paths visible to timing engines and create there path exceptions to incentivize impl. tools for desired properties (i.e. small delay and load).
The core idea is using set_max_delay
with just enough delay, value of which needs to be determined
experimentally for each technology, design and sometimes, too, the specific CDC path.
set_max_delay
overrides the setup/max path delay, which is normally derived from clock
periods and waveforms of clocks involved in the timing path. It then effectively reduces (or extends)
clock path delay as if overriding the period spec. Here are examples to illustrate it:
# effect of reducing path delay # effect of extending path delay # ----------------------------- # ------------------------------ pt_shell> create_clock -name CLKA -period 10 clkA pt_shell> create_clock -name CLKA -period 10 clkA pt_shell> create_clock -name CLKB -period 10 clkB pt_shell> create_clock -name CLKB -period 16 clkB pt_shell> report_timing -from cdc_rdy/src -to cdc_rdy/st0 pt_shell> report_timing -from cdc_rdy/src -to cdc_rdy/st0 Startpoint: cdc_rdy/src (rising edge-triggered flip-flop clocked by CLKA) Startpoint: cdc_rdy/src (rising edge-triggered flip-flop clocked by CLKA) Endpoint: cdc_rdy/st0 (rising edge-triggered flip-flop clocked by CLKB) Endpoint: cdc_rdy/st0 (rising edge-triggered flip-flop clocked by CLKB) Path Group: CLKB Path Group: CLKB Path Type: max Path Type: max Point Incr Path Point Incr Path --------------------------------------------------------------- --------------------------------------------------------------- clock CLKA (rise edge) 0.00 0.00 clock CLKA (rise edge) 30.00 30.00 clock network delay (ideal) 0.00 0.00 clock network delay (ideal) 0.00 30.00 cdc_rdy/src/CK (dffrx1) 0.00 0.00 r cdc_rdy/src/CK (dffrx1) 0.00 30.00 r cdc_rdy/src/Q (dffrx1) <- 3.00 3.00 r cdc_rdy/src/Q (dffrx1) <- 3.00 33.00 r cdc_rdy/st0/D (dffrx1) 0.00 3.00 r cdc_rdy/st0/D (dffrx1) 0.00 33.00 r data arrival time 3.00 data arrival time 33.00 clock CLKB (rise edge) 10.00 10.00 clock CLKB (rise edge) 32.00 32.00 clock network delay (ideal) 0.00 10.00 clock network delay (ideal) 0.00 32.00 clock reconvergence pessimism 0.00 10.00 clock reconvergence pessimism 0.00 32.00 cdc_rdy/st0/CK (dffrx1) 10.00 r cdc_rdy/st0/CK (dffrx1) 32.00 r library setup time -0.70 9.30 library setup time -0.70 31.30 data required time 9.30 data required time 31.30 --------------------------------------------------------------- --------------------------------------------------------------- data required time 9.30 data required time 31.30 data arrival time -3.00 data arrival time -33.00 --------------------------------------------------------------- --------------------------------------------------------------- slack (MET) 6.30 slack (VIOLATED) -1.70 pt_shell> set_max_delay 4.0 -from CLKA -to CLKB pt_shell> set_max_delay 4.0 -from CLKA -to CLKB pt_shell> report_timing -from cdc_rdy/src -to cdc_rdy/st0 pt_shell> report_timing -from cdc_rdy/src -to cdc_rdy/st0 Startpoint: cdc_rdy/src (rising edge-triggered flip-flop clocked by CLKA) Startpoint: cdc_rdy/src (rising edge-triggered flip-flop clocked by CLKA) Endpoint: cdc_rdy/st0 (rising edge-triggered flip-flop clocked by CLKB) Endpoint: cdc_rdy/st0 (rising edge-triggered flip-flop clocked by CLKB) Path Group: CLKB Path Group: CLKB Path Type: max Path Type: max Point Incr Path Point Incr Path --------------------------------------------------------------- --------------------------------------------------------------- cdc_rdy/src/CK (dffrx1) 0.00 0.00 r cdc_rdy/src/CK (dffrx1) 0.00 0.00 r cdc_rdy/src/Q (dffrx1) <- 3.00 3.00 r cdc_rdy/src/Q (dffrx1) <- 3.00 3.00 r cdc_rdy/st0/D (dffrx1) 0.00 3.00 r cdc_rdy/st0/D (dffrx1) 0.00 3.00 r data arrival time 3.00 data arrival time 3.00 max_delay 4.00 4.00 max_delay 4.00 4.00 clock reconvergence pessimism 0.00 4.00 clock reconvergence pessimism 0.00 4.00 library setup time -0.70 3.30 library setup time -0.70 3.30 data required time 3.30 data required time 3.30 --------------------------------------------------------------- --------------------------------------------------------------- data required time 3.30 data required time 3.30 data arrival time -3.00 data arrival time -3.00 --------------------------------------------------------------- --------------------------------------------------------------- slack (MET) 0.30 slack (MET) 0.30
Keep in mind the path delay exception applies only to the clock component of the timing path and that there still remains the physical constraint (i.e. the setup time, here) of the receiving end-point and clock attributes such as clock uncertainty.
There are some caveats to watch for:
Clock latency of CDC clocks is factored into the max delay timing, unless using
-ignore_clock_latency
withset_max/min_delay
:pt_shell> set_clock_latency 2.0 CLKA pt_shell> set_clock_latency 1.0 CLKB pt_shell> set_max_delay 4.0 -from CLKA -to CLKB pt_shell> report_timing -from cdc_rdy/src -to cdc_rdy/st0 Startpoint: cdc_rdy/src (rising edge-triggered flip-flop clocked by CLKA) Endpoint: cdc_rdy/st0 (rising edge-triggered flip-flop clocked by CLKB) Path Group: CLKB Path Type: max Point Incr Path --------------------------------------------------------------- clock network delay (ideal) 2.00 2.00 <--- !!! cdc_rdy/src/CK (dffrx1) 0.00 2.00 r cdc_rdy/src/Q (dffrx1) <- 3.00 5.00 r cdc_rdy/st0/D (dffrx1) 0.00 5.00 r data arrival time 5.00 max_delay 4.00 4.00 clock network delay (ideal) 1.00 5.00 <--- !!! clock reconvergence pessimism 0.00 5.00 library setup time -0.70 4.30 data required time 4.30 --------------------------------------------------------------- data required time 4.30 data arrival time -5.00 --------------------------------------------------------------- slack (VIOLATED) -0.70 pt_shell> set_max_delay 4.0 -from CLKA -to CLKB -ignore_clock_latency pt_shell> report_timing -from cdc_rdy/src -to cdc_rdy/st0 Startpoint: cdc_rdy/src (rising edge-triggered flip-flop clocked by CLKA) Endpoint: cdc_rdy/st0 (rising edge-triggered flip-flop clocked by CLKB) Path Group: **default** Path Type: max Point Incr Path --------------------------------------------------------------- cdc_rdy/src/CK (dffrx1) 0.00 0.00 r cdc_rdy/src/Q (dffrx1) <- 3.00 3.00 r cdc_rdy/st0/D (dffrx1) 0.00 3.00 r data arrival time 3.00 max_delay 4.00 4.00 clock reconvergence pessimism 0.00 4.00 library setup time -0.70 3.30 data required time 3.30 --------------------------------------------------------------- data required time 3.30 data arrival time -3.00 --------------------------------------------------------------- slack (MET) 0.30
Note
Until recently (v21.x) Cadence Innovus had counted in the clock latency for min/max delay exceptions irrespective of
-ignore_clock_latency
. Users had to then avoid the switch and factor the actual clock latency in the delay value.Hold/min timing is not changed by
set_max_delay
. Therefore the max delay path exception (on the crossing path, i.e.-from src -to st0
) shall be followed by theset_min_delay
. The min delay exception shall, too, be applied with the-ignore_clock_latency
option and its amount determined experimentally.Clock uncertainty is factored into the delay path calculation. Unless taking care, the hold component of it will (falsely) reduce the hold margin and call for hold fixing. This is something we want to avoid and hence counter/compensate it by a negative
set_min_delay
(in the amount of the hold uncertainty constraint):pt_shell> set_clock_uncertainty 3.0 CLKB pt_shell> report_timing -from cdc_rdy/src -to cdc_rdy/st0 -delay_type min Startpoint: cdc_rdy/src (rising edge-triggered flip-flop clocked by CLKA) Endpoint: cdc_rdy/st0 (rising edge-triggered flip-flop clocked by CLKB) Path Group: CLKB Path Type: min Point Incr Path --------------------------------------------------------------- clock CLKA (rise edge) 0.00 0.00 clock network delay (ideal) 0.00 0.00 cdc_rdy/src/CK (dffrx1) 0.00 0.00 r cdc_rdy/src/Q (dffrx1) <- 3.00 3.00 r cdc_rdy/st0/D (dffrx1) 0.00 3.00 r data arrival time 3.00 clock CLKB (rise edge) 0.00 0.00 clock network delay (ideal) 0.00 0.00 clock reconvergence pessimism 0.00 0.00 clock uncertainty 3.00 3.00 <-- !!! cdc_rdy/st0/CK (dffrx1) 3.00 r library hold time 0.30 3.30 data required time 3.30 --------------------------------------------------------------- data required time 3.30 data arrival time -3.00 --------------------------------------------------------------- slack (VIOLATED) -0.30 pt_shell> set_min_delay -3.0 -from cdc_rdy/src -to cdc_rdy/st0 pt_shell> report_timing -from cdc_rdy/src -to cdc_rdy/st0 -delay_type min Startpoint: cdc_rdy/src (rising edge-triggered flip-flop clocked by CLKA) Endpoint: cdc_rdy/st0 (rising edge-triggered flip-flop clocked by CLKB) Path Group: CLKB Path Type: min Point Incr Path --------------------------------------------------------------- ... data arrival time 3.00 min_delay -3.00 -3.00 <-- !!! clock reconvergence pessimism 0.00 -3.00 clock uncertainty 3.00 0.00 <-- !!! library hold time 0.30 0.30 data required time 0.30 --------------------------------------------------------------- data required time 0.30 data arrival time -3.00 --------------------------------------------------------------- slack (MET) 2.70
Note that the clock uncertainty affects both parts of the CDC path; i.e. the crossing path
-from src -to st0
and the synchronizing path-from st0 -to st1
.
There are some other practictal aspects to consider and discuss:
As we discussed in Ignoring CDC Paths,
set_clock_groups -asynchronous
is a good way to indicate how we think about mutual interaction of CDC clocks. It has the-allow_paths
option that prevents ignoring the paths between groups. Henceset_clock_groups -group CLKA -group CLKB -asynchronous -allow_paths
has eventually no effect on timing betweenclkA
andclkB
, but it tells SDC readers that we intentionally kept the CDC paths enabled.It is good practice to use of
set_max_delay 0.0 -from CLKA -to CLKB
as a safe default. In case we missed a synchronizer or there were paths with no synchronization, this constraint would trigger a timing violation for us to notice.It may be a good idea to use
set_max_load
and/orset_max_transition
for making impl. tools choose meaninglful flop cells and places in the floorplan. However, these constraints vary among tools. Typically, synthesis tools often limit these constraints to designs and ports, while STA and P&R tools support many different objects.Hence we recommend avoiding these constraints, or using them selectively only for certain tools.
One way to avoiding inserting buffers into CDC paths may be
set_dont_touch <net>
. With properset_max/min_delay
, this constraint is not really necessary.In practice, CDC implementation and checking is usually responsibility of RTL designers. Unless instructed, phys. impl. engineers then hardly know where all the CDC paths are and hence what all to constrain. Poor naming choice makes it even more difficult.
It is advised to devise an STA script that lists all paths crossing among every two clock domains (e.g. [Solvnet2] for PrimeTime) and review those paths. Well defined CDC schemes would have limited number of crossings. Seeing hundreds to thousands of paths between asynchronous clocks usually indicates a poor CDC implementation.
With all the above, CDC constraints for our sample circuit would then look like follows:
# clock definition create_clock -name CLKA -period 10 clkA create_clock -name CLKB -period 12 clkB -waveform {9 3} set_clock_latency 3.5 CLKA set UNCERT 0.5 set_clock_uncertainty ${UNCERT} {CLKA CLKB} # (optional) async. clock groups set_clock_groups -group CLKA -group CLKB -asynchronous -allow_paths # safe defaults for not otherwise overriden crossing paths set_max_delay 0.0 -from CLKA -to CLKB -ignore_clock_latency; set_max_delay 0.0 -from CLKB -to CLKA -ignore_clock_latency; # synchronizer naming pattern for extra filtering set patt * # constrain single-bit 2FF synchronizers # (ideally synchronizers are hierarchical modules and their instances have # some unique naming convention; e.g. `cdc_` prefix) foreach_in_collection c [get_cells -hierarchical cdc_* -filter full_name=~*${patt}*] { set src_ff [get_cells -hierarchical src* -filter full_name=~[get_object_name $c]*]; set st0_ff [get_cells -hierarchical st0* -filter full_name=~[get_object_name $c]*]; set st1_ff [get_cells -hierarchical st1* -filter full_name=~[get_object_name $c]*]; # max_delay constraint is to cover/override setup checks set_max_delay 4.0 -from ${src_ff} -to ${st0_ff} -ignore_clock_latency; # crossing path set_max_delay 4.0 -from ${st0_ff} -to ${st1_ff} -ignore_clock_latency; # synchronizing path # min_delay constraint is to cover/override hold checks # (Using a negative delay to compensate clock uncertainty. We would # ideally want the tool not to buffer the path to fix hold timing.) set_min_delay [expr - ${UNCERT}] -from ${src_ff} -to ${st0_ff} -ignore_clock_latency; # crossing path } # constrain multi-bit crossing paths # (these are less likely to follow a naming convention) set_max_delay 7.0 -from FF*A -to FF*B -ignore_clock_latency set_min_delay [expr - ${UNCERT}] -from FF*A -to FF*B -ignore_clock_latency
[Solvnet1] | Synopsys, Constraining Paths Between Asynchronous Clock Domains, Synopsys SolveNet Doc Id: 2410687, https://solvnet.synopsys.com/retrieve/2410687.html |
[Solvnet2] | Synopsys, How Can I Easily Report Paths Which Cross Clock Domains?, Synopsys SolveNet Doc Id: 025077, https://solvnet.synopsys.com/retrieve/025077.html |
[PTUG1] | Synopsys PrimeTime User Guide, version M-2016.12, section Using Multiple Clocks, https://solvnet.synopsys.com/dow_retrieve/M-2016.12/ptolh/Default.htm#ptug/ptug6_Using_Multiple_Clocks.html |
[PTUG2] | Synopsys PrimeTime User Guide, version M-2016.12, section Timing Paths and Exceptions, https://solvnet.synopsys.com/dow_retrieve/M-2016.12/ptolh/Default.htm#ptug/ptug7_Timing_Exceptions.html |
[Golson2014] | (1, 2, 3) Golson, Steve. Synchronization and Metastability. Synopsys Users Group (SNUG) Silicon Valley 2014, http://trilobyte.com/pdf/golson_snug14.pdf |
[sv_hadskake_comps] | Brabec, T.: Handshake Protocols, https://github.com/brabect1/sv_handshake_comps/blob/master/doc/handshake.rst |
This is a great doc. Thanks!