Achieving Timing Closure for Large-Scale FPGAs Using Pblock Constraints

Environment: Vivado 2023.2

FPGA Model: XCVU47P

FPGA Project Overview:

The main control logic of the FPGA uses four AXI interfaces with on-chip HBM resources (supporting up to 16 groups);

Main clock domain frequency: 250MHz

HBM interface clock frequency: 450MHz

Problem:

As the number of logic resources and BRAM used in the design increases, timing closure cannot be achieved.

Exploration of Ideas:

Attempt 1: Change asynchronous reset to synchronous reset.

Modification Method: The original RTL design used asynchronous reset. Following XILINX FPGA design principles, a Python script was used to change all asynchronous resets in the RTL design to synchronous resets.

Result: After modifying the RTL, timing still could not be closed.

Attempt 2: Balance the usage of BRAM and URAM.

Modification Method: The XCVU47P has a total of 960 Ultra RAMs. The synthesis results showed that Vivado automatically used very few URAMs, while the usage rate of BRAM exceeded 50%. Therefore, an attempt was made to force some modules that used BRAM to use URAM instead.

Result: In different RTL logic versions, the above method helped somewhat with timing closure, but in most cases, timing still could not be closed.

Solution:

The logic resources of the XCVU47P are very large. After implementing the design, it was observed that the resource usage was concentrated in the lower left corner of the chip. Additionally, our RTL logic design can be divided into two large modules: chip_top and tester, connected by an AXI STREAM bus. Among them, only the chip_top module uses the HBM interface.

Constraint Method Used:

Use pblock constraints to confine the chip_top module that uses HBM to the lower left corner of the chip, adjacent to the required HBM module. The tester module is constrained directly above the chip_top.

create_pblock pblock_u_tester

add_cells_to_pblock [get_pblocks pblock_u_tester] [get_cells -quiet [list u_tester]]

resize_pblock [get_pblocks pblock_u_tester] -add {CLOCKREGION_X0Y5:CLOCKREGION_X3Y7} 

create_pblock pblock_chip_top

add_cells_to_pblock [get_pblocks pblock_chip_top] [get_cells -quiet [list u_chip_top]]

resize_pblock [get_pblocks pblock_chip_top] -add {CLOCKREGION_X0Y0:CLOCKREGION_X3Y4} 

After adding the above constraints in the xdc, timing closure was achieved.

The distribution of device resources is shown in the attached image.

Achieving Timing Closure for Large-Scale FPGAs Using Pblock Constraints

Summary of Timing Closure Methods for This Project:

1. When BRAM usage exceeds 50%, forcing some BRAM to use URAM helps with timing closure;

2. After some modules are changed to use URAM, if the BRAM usage in the design continues to increase, reaching around 40%, the method of forcing URAM usage becomes ineffective;

3. Increasing pblock constraints and manually confining RTL modules to reasonable areas can achieve timing closure.

Question: Is Vivado’s placement and routing algorithm not smart enough when facing synthesis and timing closure for large-scale FPGAs?

Leave a Comment