Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save Chester-Gillon/b3b5b3a91d734807e4e739d3fc2b0fa8 to your computer and use it in GitHub Desktop.
Save Chester-Gillon/b3b5b3a91d734807e4e739d3fc2b0fa8 to your computer and use it in GitHub Desktop.
Notes about AMD Alveo U200 Data Center Accelerator Card (Active)

0. Introduction

This contains notes about using a AMD Alveo™ U200 Data Center Accelerator Card (Active)

The card was second hand and has been installed in slot 2 on a HP Z6 G4 workstation, with the 8-pin aux power cable connected. This is a PCle3 x16 slot connected to the CPU. The slot 2 BIOS settings were the following, with Hot Plug enabled since had previously used other FPGA boards in the slot:

Slot 2 PCI Express x16
	Disable
	*Enable
Slot 2 Option Rom Download
	Disable
	*Enable
Slot 2 Limit PCIe Speed
	*Auto
	Gen1 (2.5Gbps)
	Gen2 (5Gbps)
	Gen3 (8Gbps)
Slot 2 Bifurcation
	*Auto
	x8x8
	x4x4x4x4
Slot 2 Intel VROC NVMe Raid
	*Disable
	Enable
Slot 2 Hot Plug
	Disable
	*Enable
Slot 2 Hot Plug Buses
	0
	*8
	16
	32
	64
	128
Slot 2 Resizable Bars
	*Disable
	Enable

PCIe Training Reset is also enabled in the BIOS, which was previously enabled when investigating failures to enumerate as a PCIe endpoint for some other FPGA designs.

The first micro USB cable tried didn't fit fully and couldn't detect the JTAG port. Alveo Elongated USB Cable is described as:

U200/U250 Micro-USB connector location requires elongated USB for proper connection on Active cards due to blocking from heat spreader. Elongated USB cables can be used for both active and passive cards, but is a must have for active cards.

Found a different micro USB cable, which while didn't appeared to be the length of exposed onnector but a thinner plastic shell. That allowed the JTAG port to be detected as:

Bus 003 Device 008: ID 0403:6011 Future Technology Devices International, Ltd FT4232H Quad HS USB-UART/FIFO IC

The target part is xcu200-fsgd2104-2-e. From Alveo and Kria the corresponding standalone FPGA is a XCVU9P. Some documentation such as Virtex UltraScale+ Devices Available GT Quads from the UltraScale+ Devices Integrated Block for PCI Express Product Guide (PG213) doesn't list the U200 so have to search for the corresponding standalone FPGA.

1. State as delivered

After initially fitting the card booted into Windows 11 Pro for Workstations Version 24H2.

Device Manager shows a PCI Serial Port in slot 2, but doesn't find a compatible driver.

It is shown as a Gen 3 x16 device:

PS C:\Users\mr_halfword> C:\Users\mr_halfword\Git-projects\fpga_sio\multiple_boards\report_pcie_links.ps1

Name                                                  ExpressSpecVersion MaxLinkSpeed MaxLinkWidth CurrentLinkSpeed CurrentLinkWidth
----                                                  ------------------ ------------ ------------ ---------------- ----------------
Standard NVM Express Controller                                        2            3            4                3                4
Mellanox ConnectX-3 PRO VPI (MT04103) Network Adapter                  2            3            8                3                8
Intel(R) Ethernet Connection X722 for 10GBASE-T                        2            1            1                1                1
Intel(R) Ethernet Connection X722 for 10GBASE-T #2                     2            1            1                1                1
Intel(R) Ethernet Connection X722 for 1GbE                             2            1            1                1                1
High Definition Audio Controller                                       2            3           16                1                4
NVIDIA Quadro K620                                                     2            3           16                1                4
PCI Serial Port                                                        2            3           16                3               16
Mellanox ConnectX-4 Adapter                                            2            3           16                3               16

Booted into openSUSE Leap 15.5. dump_pci_info_pciutils shows the following for the serial port PCIe endpoint

linux@DESKTOP-BVUMP11:~/fpga_sio/software_tests/eclipse_project/bin/release> dump_info/dump_pci_info_pciutils 1c9d
domain=0000 bus=31 dev=00 func=00 rev=00
  vendor_id=1c9d (Vendor 1c9d) device_id=0101 (Device 0101) subvendor_id=1c9d subdevice_id=0007
  iommu_group=81
  physical_slot=2-2
  control: I/O- Mem+ BusMaster- ParErr+ SERR+ DisINTx-
  status: INTx- <ParErr- >TAbort- <TAbort- <MAbort- >SERR- DetParErr-
  bar[0] base_addr=97400000 size=200000 is_IO=0 is_prefetchable=0 is_64=1
  Capabilities: [40] Power Management
  Capabilities: [48] Message Signaled Interrupts
  Capabilities: [70] PCI Express v2 Express Endpoint, MSI 0
    Link capabilities: Max speed 8 GT/s Max width x16
    Negotiated link status: Current speed 8 GT/s Width x16
    Link capabilities2: Supported link speeds 2.5 GT/s 5.0 GT/s 8.0 GT/s
    DevCap: MaxPayload 1024 bytes PhantFunc 0 Latency L0s Maximum of 64 ns L1 Maximum of 1 μs
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
    DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq-
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
    DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
    LnkCap: Port # 0 ASPM not supported
            L0s Exit Latency More than 4 μs
            L1 Exit Latency More than 64 μs
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
    LnkCtl: ASPM Disabled RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- ABWMgmt-
    LnkSta: TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
  domain=0000 bus=30 dev=00 func=00 rev=04
    vendor_id=8086 (Intel Corporation) device_id=2030 (Sky Lake-E PCI Express Root Port A)
    iommu_group=51
    driver=pcieport
    physical_slot=2
    control: I/O+ Mem+ BusMaster+ ParErr+ SERR+ DisINTx+
    status: INTx- <ParErr- >TAbort- <TAbort- <MAbort- >SERR- DetParErr-
    Capabilities: [40] Bridge subsystem vendor/device ID
    Capabilities: [60] Message Signaled Interrupts
    Capabilities: [90] PCI Express v2 Root Port, MSI 0
      Link capabilities: Max speed 8 GT/s Max width x16
      Negotiated link status: Current speed 8 GT/s Width x16
      Link capabilities2: Supported link speeds 2.5 GT/s 5.0 GT/s 8.0 GT/s
      DevCap: MaxPayload 256 bytes PhantFunc 0 Latency L0s Maximum of 64 ns L1 Maximum of 1 μs
              ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
      DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq-
              RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
      DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
      LnkCap: Port # 5 ASPM not supported
              L0s Exit Latency 256 ns to less than 512 ns
              L1 Exit Latency 8 μs to less than 16 μs
              ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
      LnkCtl: ASPM Disabled RCB 64 bytes Disabled- CommClk+
              ExtSynch- ClockPM- AutWidDis- BWInt- ABWMgmt-
      LnkSta: TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
      SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise-
              Slot #2 PowerLimit 0.000W Interlock- NoCompl-
    Capabilities: [e0] Power Management

There is no driver bound. The PCI vendor 1c9d isn't known to the PCI libraries. The PCI SIG Member Companies search reports this is for http://www.illumina.com/. Their website say they provide innovative sequencing and array technologies for medical research. Can't see any obvious products which make use of a U200.

2. Testing PCIe enumeration

2.1. U200_enum/gen3_x16_normal_order

This was the initial design created, to prove could create a bitstream with the no-cost Vivado license, before brought the card.

Programmed the bitstream while openSUSE Leap 15.5 was booted, and bind_xilinx_devices_to_vfio.sh was able to bind to the loaded bitstream PCIe endpoint. I.e. hot plug worked.

display_identified_pcie_fpga_designs reports zero for the User access build timestamp:

~/fpga_sio/software_tests/eclipse_project/bin/release> identify_pcie_fpga_design/display_identified_pcie_fpga_designs 
Opening device 0000:31:00.0 (10ee:903f) with IOMMU group 22
Enabled bus master for 0000:31:00.0

Design AS02MC04_enum:
  PCI device 0000:31:00.0 rev 00 IOMMU group 22  physical slot 2-2

  DMA bridge bar 1 memory size 0x1000
  Channel ID  addr_alignment  len_granularity  num_address_bits
       H2C 0               1                1                64
       C2H 0               1                1                64
  User access build timestamp : 00000000 - 00/00/2000 00:00:00

Based upon the output of parse_bitstream_file failed to enable the user access build timestamp.

After changing the design to insert the user access build timestamp and allocate a different identity:

identify_pcie_fpga_design/display_identified_pcie_fpga_designs 
Opening device 0000:31:00.0 (10ee:903f) with IOMMU group 22
Enabled bus master for 0000:31:00.0

Design U200_enum:
  PCI device 0000:31:00.0 rev 00 IOMMU group 22  physical slot 2-2

  DMA bridge bar 1 memory size 0x1000
  Channel ID  addr_alignment  len_granularity  num_address_bits
       H2C 0               1                1                64
       C2H 0               1                1                64
  User access build timestamp : 34B37034 - 06/09/2025 23:00:52
dump_info/dump_pci_info_pciutils 
domain=0000 bus=31 dev=00 func=00 rev=00
  vendor_id=10ee (Xilinx Corporation) device_id=903f (Device 903f) subvendor_id=0002 subdevice_id=001c
  iommu_group=22
  driver=vfio-pci
  physical_slot=2-2
  control: I/O- Mem+ BusMaster- ParErr- SERR+ DisINTx-
  status: INTx- <ParErr- >TAbort- <TAbort- <MAbort- >SERR- DetParErr-
  bar[0] base_addr=97410000 size=1000 is_IO=0 is_prefetchable=0 is_64=0
  bar[1] base_addr=97400000 size=10000 is_IO=0 is_prefetchable=0 is_64=0
  Capabilities: [40] Power Management
  Capabilities: [70] PCI Express v2 Express Endpoint, MSI 0
    Link capabilities: Max speed 8 GT/s Max width x16
    Negotiated link status: Current speed 8 GT/s Width x16
    Link capabilities2: Supported link speeds 2.5 GT/s 5.0 GT/s 8.0 GT/s
    DevCap: MaxPayload 1024 bytes PhantFunc 0 Latency L0s Maximum of 64 ns L1 Maximum of 1 μs
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
    DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
    DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
    LnkCap: Port # 0 ASPM not supported
            L0s Exit Latency More than 4 μs
            L1 Exit Latency More than 64 μs
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
    LnkCtl: ASPM Disabled RCB 64 bytes Disabled- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- ABWMgmt-
    LnkSta: TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
  domain=0000 bus=30 dev=00 func=00 rev=04
    vendor_id=8086 (Intel Corporation) device_id=2030 (Sky Lake-E PCI Express Root Port A)
    iommu_group=51
    driver=pcieport
    physical_slot=2
    control: I/O+ Mem+ BusMaster+ ParErr+ SERR+ DisINTx+
    status: INTx- <ParErr- >TAbort- <TAbort- <MAbort- >SERR- DetParErr-
    Capabilities: [40] Bridge subsystem vendor/device ID
    Capabilities: [60] Message Signaled Interrupts
    Capabilities: [90] PCI Express v2 Root Port, MSI 0
      Link capabilities: Max speed 8 GT/s Max width x16
      Negotiated link status: Current speed 8 GT/s Width x16
      Link capabilities2: Supported link speeds 2.5 GT/s 5.0 GT/s 8.0 GT/s
      DevCap: MaxPayload 256 bytes PhantFunc 0 Latency L0s Maximum of 64 ns L1 Maximum of 1 μs
              ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
      DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq-
              RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
      DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
      LnkCap: Port # 5 ASPM not supported
              L0s Exit Latency 256 ns to less than 512 ns
              L1 Exit Latency 8 μs to less than 16 μs
              ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
      LnkCtl: ASPM Disabled RCB 64 bytes Disabled- CommClk+
              ExtSynch- ClockPM- AutWidDis- BWInt- ABWMgmt-
      LnkSta: TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
      SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise-
              Slot #2 PowerLimit 0.000W Interlock- NoCompl-
    Capabilities: [e0] Power Management

2.2. Unable to create a bifurcated x8x8 PCIe design

In a Block Design tried to create a x8x8 design, with two DMA/Bridge Subsystem for PCI Express IP which share the same Utility Buffer for the PCIe refclk:

image

The design synthesised, but got errors during implementation with conflicts over placement.

The first error was:

[DRC REQP-1963] connects_too_many_BUFG_GT_SYNC_loads: The IBUFDS_GTE4 U200_enum_i/util_ds_buf_0/U0/USE_IBUFDS_GTE4.GEN_IBUFDS_GTE4[0].IBUFDS_GTE4_I (ODIV2 pin) is driving more than one BUFG_GT_SYNC load, which is an unroutable situation. Optimization may not have been able to merge BUFG_GT_SYNC cells because of differing control pin connections.

lp with BUFDS_GTE4 in Ultrascale+. describes issues with device constaints.

For the placement options for the DMA/Bridge Subsystem for PCI Express IP:

  • PCIe Block Location of X1Y2 allows a max width of x16 with QUADs 224 to 227
  • PCIe Block Location of X1Y4 allows a max width of x8 with QUADs 230 or 231

The PCIe constaints for the U200 use:

  • QUADs 224 to 227 for the PCIe lanes
  • QUAD 226 for the PCIe refclk

TABLE: Virtex UltraScale+ Devices Available GT Quads (XCVU9P) shows for a XCVU9P in the FSGD2104 package:

PCIE Blocks Quads with Max Link Width X16 Support Quads with Max Link Width X8 Support Quads with Max Link Width X4 Support
X1Y2 GTY_Quad_228, GTY_Quad_227 GTY_Quad_226, GTY_Quad_225 GTY_Quad_224
X1Y4 GTY_Quad_233, GTY_Quad_232 GTY_Quad_231, GTY_Quad_230 GTY_Quad_229

UltraScale+ Device Packaging and Pinouts Product Specification User Guide (UG575) Figure 1-122: XCVU9P Banks in FSGD2104 Package shows the "PCIE4 X1Y2 (tandem)" and "PCIE4 X1Y4" blocks are different sides of a SLR Crossing GT Locations in the UltraScale+ Devices Integrated Block for PCI Express Product Guide (PG213) contains:

A GT Quad is comprised of four GT lanes. When selecting GT Quads for the PCIe IP, AMD recommends that you use the GT Quad most adjacent to the PCIe hard block. While this is not required, it improves place, route, and timing for the design.

  • Link widths of x1, x2, and x4 require one bonded GT Quad and should not split lanes between two GT Quads.
  • A link width of x8 requires two adjacent GT Quads that are bonded and are in the same SLR.
  • A link width of x16 requires four adjacent GT Quads that are bonded and are in the same SLR.

The SYSMON, Configuration, PCIe, Interlaken, and 100GE Integrated Blocks section in UG575 has:

Note: Do not connect the integrated block for PCIe to transceiver channels through an SLR crossing. For further details, refer to the Placement Rules section of the UltraScale Devices Gen3 Integrated Block for PCI Express Product Guide (PG156) and UltraScale+ Devices Integrated Block for PCI Express Product Guide (PG213). Blocks with an additional (Tandem) label support Tandem configuration.

The U200 has:

  • QUADs 224 to 227 connected to PCIe lanes
  • QAUD 230 connected to QSFP1
  • QUAD 231 connected to QSFP0

Perhaps that is why the DMA/Bridge Subsystem for PCI Express IP selection of the PCIe Block Location and QUADs is showing less options for number of blocks and QUADs compared to a XCVU9P in the FSGD2104 package.

In the Alveo Product Details in the Alveo U200 and U250 Data Center Accelerator Cards Data Sheet (DS962):

  1. Figure U200/U250 Block Diagram has the text:

PCIe x 16 or PCIe x8 (2)

  1. However, the Alveo U200/U250 Accelerator Card Product Details table has: PCIe Interface: Gen3 x16

Table Alveo U200/U250 Features in Alveo U200 and U250 Accelerator Cards User Guide (UG1289) has:

Gen1, 2, or 3 up to x16 and Dual Gen4 x8 compatible

The VU9P doesn't have a "PCIe Gen3 x16/Gen4 x8" block, so the mention of "Dual Gen4 x8 compatible" doesn't seem correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment