Skip to content

Instantly share code, notes, and snippets.

@Chester-Gillon
Last active March 22, 2025 16:53
Show Gist options
  • Save Chester-Gillon/cf17eb97c63e754de4e06bc6552ffdb7 to your computer and use it in GitHub Desktop.
Save Chester-Gillon/cf17eb97c63e754de4e06bc6552ffdb7 to your computer and use it in GitHub Desktop.
PCIe DMA bandwidth

0. Introduction

Contains some notes about achieved PCIe DMA bandwidth using FPGAs, using fpga_sio for the FPGA and test software.

1. HP Z4 G4

A HP Z4 G4 with an Intel W-2123 CPU. Running AlmaLinux 8.10

Unless specified otherwise using 3 FPGAs fitted:

  • NiteFury : Artix-7 PCIe2 x4
  • TEF1001 : Kintex-7 PCIe2 x4
  • XCKU5P_DUAL_QSFP : Kintex UltraScale+ PCIe3 x8

1.1 dma_stream_loopback direct

Uses the following bitstreams, in which the loopback is connected directly on the DMA bridge, using fixed connections between adjacent channels:

$ bin/release/identify_pcie_fpga_design/display_identified_pcie_fpga_designs 
Opening device 0000:15:00.0 (10ee:7024) with IOMMU group 41
Enabled bus master for 0000:15:00.0
Opening device 0000:2d:00.0 (10ee:9038) with IOMMU group 85
Enabled bus master for 0000:2d:00.0
Opening device 0000:36:00.0 (10ee:7024) with IOMMU group 86
Enabled bus master for 0000:36:00.0

Design TEF1001_dma_stream_loopback:
  PCI device 0000:15:00.0 IOMMU group 41
  DMA bridge bar 2 AXI Stream
  Channel ID  addr_alignment  len_granularity  num_address_bits
       H2C 0               1                1                64
       H2C 1               1                1                64
       C2H 0               1                1                64
       C2H 1               1                1                64
  User access build timestamp : C1310413 - 24/02/2024 16:16:19
  Quad SPI registers at bar 0 offset 0x2000
  XADC registers at bar 0 offset 0x3000
  IIC registers at bar 0 offset 0x0
  bit-banged I2C GPIO registers at bar 0 offset 0x1000
  All 2 master ports in AXI4-Stream Switch are disabled

Design XCKU5P_DUAL_QSFP_dma_stream_loopback:
  PCI device 0000:2d:00.0 IOMMU group 85
  DMA bridge bar 2 AXI Stream
  Channel ID  addr_alignment  len_granularity  num_address_bits
       H2C 0               1                1                64
       H2C 1               1                1                64
       H2C 2               1                1                64
       H2C 3               1                1                64
       C2H 0               1                1                64
       C2H 1               1                1                64
       C2H 2               1                1                64
       C2H 3               1                1                64
  User access build timestamp : F5B0E950 - 30/11/2024 14:37:16
  Quad SPI registers at bar 0 offset 0x0
  SYSMON registers at bar 0 offset 0x1000
  All 4 master ports in AXI4-Stream Switch are disabled

Design NiteFury_dma_stream_loopback:
  PCI device 0000:36:00.0 IOMMU group 86
  DMA bridge bar 2 AXI Stream
  Channel ID  addr_alignment  len_granularity  num_address_bits
       H2C 0               1                1                64
       H2C 1               1                1                64
       C2H 0               1                1                64
       C2H 1               1                1                64
  User access build timestamp : C1310A5B - 24/02/2024 16:41:27
  Quad SPI registers at bar 0 offset 0x0
  XADC registers at bar 0 offset 0x1000
  All 2 master ports in AXI4-Stream Switch are disabled

While the FPGA designs don't contain the AXI4-Stream Switch, the values read from the undefined reads had the most significant bit set, which resulted in all the ports being reported as disabled.

Result with all streams tested in parallel:

[mr_halfword@skylake-alma release]$ xilinx_dma_bridge_for_pcie/test_dma_bridge_parallel_streams 
Opening device 0000:15:00.0 (10ee:7024) with IOMMU group 41
Enabled bus master for 0000:15:00.0
Opening device 0000:2d:00.0 (10ee:9038) with IOMMU group 85
Enabled bus master for 0000:2d:00.0
Opening device 0000:36:00.0 (10ee:7024) with IOMMU group 86
Enabled bus master for 0000:36:00.0
Using num_descriptors=64 bytes_per_buffer=0x1000000 data_mapping_size_words=0x10000000
Selecting test of TEF1001_dma_stream_loopback design PCI device 0000:15:00.0 IOMMU group 41 H2C channel 0 C2H channel 1
Selecting test of TEF1001_dma_stream_loopback design PCI device 0000:15:00.0 IOMMU group 41 H2C channel 1 C2H channel 0
Selecting test of XCKU5P_DUAL_QSFP_dma_stream_loopback design PCI device 0000:2d:00.0 IOMMU group 85 H2C channel 0 C2H channel 1
Selecting test of XCKU5P_DUAL_QSFP_dma_stream_loopback design PCI device 0000:2d:00.0 IOMMU group 85 H2C channel 1 C2H channel 0
Selecting test of XCKU5P_DUAL_QSFP_dma_stream_loopback design PCI device 0000:2d:00.0 IOMMU group 85 H2C channel 2 C2H channel 3
Selecting test of XCKU5P_DUAL_QSFP_dma_stream_loopback design PCI device 0000:2d:00.0 IOMMU group 85 H2C channel 3 C2H channel 2
Selecting test of NiteFury_dma_stream_loopback design PCI device 0000:36:00.0 IOMMU group 86 H2C channel 0 C2H channel 1
Selecting test of NiteFury_dma_stream_loopback design PCI device 0000:36:00.0 IOMMU group 86 H2C channel 1 C2H channel 0
Press Ctrl-C to stop test
  0000:15:00.0 0 -> 1 818.597 Mbytes/sec (8170504192 bytes in 9.981102 secs)
  0000:15:00.0 1 -> 0 818.597 Mbytes/sec (8170504192 bytes in 9.981107 secs)
  0000:2d:00.0 0 -> 1 1487.514 Mbytes/sec (14864613376 bytes in 9.992921 secs)
  0000:2d:00.0 1 -> 0 1487.513 Mbytes/sec (14864613376 bytes in 9.992929 secs)
  0000:2d:00.0 2 -> 3 1487.513 Mbytes/sec (14864613376 bytes in 9.992930 secs)
  0000:2d:00.0 3 -> 2 1487.513 Mbytes/sec (14864613376 bytes in 9.992931 secs)
  0000:36:00.0 0 -> 1 818.584 Mbytes/sec (8170504192 bytes in 9.981270 secs)
  0000:36:00.0 1 -> 0 818.583 Mbytes/sec (8170504192 bytes in 9.981276 secs)
<<snip>>
  0000:15:00.0 0 -> 1 818.598 Mbytes/sec (8187281408 bytes in 10.001590 secs)
  0000:15:00.0 1 -> 0 818.598 Mbytes/sec (8187281408 bytes in 10.001589 secs)
  0000:2d:00.0 0 -> 1 1487.519 Mbytes/sec (14864613376 bytes in 9.992892 secs)
  0000:2d:00.0 1 -> 0 1487.519 Mbytes/sec (14864613376 bytes in 9.992892 secs)
  0000:2d:00.0 2 -> 3 1487.519 Mbytes/sec (14864613376 bytes in 9.992893 secs)
  0000:2d:00.0 3 -> 2 1487.519 Mbytes/sec (14864613376 bytes in 9.992893 secs)
  0000:36:00.0 0 -> 1 818.594 Mbytes/sec (8187281408 bytes in 10.001636 secs)
  0000:36:00.0 1 -> 0 818.594 Mbytes/sec (8187281408 bytes in 10.001636 secs)

^C  0000:15:00.0 0 -> 1 818.596 Mbytes/sec (5167382528 bytes in 6.312490 secs)
  0000:15:00.0 1 -> 0 818.597 Mbytes/sec (5167382528 bytes in 6.312486 secs)
  0000:2d:00.0 0 -> 1 1487.491 Mbytes/sec (8506048512 bytes in 5.718388 secs)
  0000:2d:00.0 1 -> 0 1487.491 Mbytes/sec (8506048512 bytes in 5.718386 secs)
  0000:2d:00.0 2 -> 3 1487.491 Mbytes/sec (8506048512 bytes in 5.718385 secs)
  0000:2d:00.0 3 -> 2 1487.491 Mbytes/sec (8506048512 bytes in 5.718385 secs)
  0000:36:00.0 0 -> 1 818.590 Mbytes/sec (5167382528 bytes in 6.312539 secs)
  0000:36:00.0 1 -> 0 818.591 Mbytes/sec (5167382528 bytes in 6.312535 secs)

Overall test statistics:
  0000:15:00.0 0 -> 1 818.599 Mbytes/sec (782824898560 bytes in 956.298503 secs)
  0000:15:00.0 1 -> 0 818.599 Mbytes/sec (782824898560 bytes in 956.298504 secs)
  0000:2d:00.0 0 -> 1 1487.534 Mbytes/sec (1421650952192 bytes in 955.709649 secs)
  0000:2d:00.0 1 -> 0 1487.534 Mbytes/sec (1421650952192 bytes in 955.709654 secs)
  0000:2d:00.0 2 -> 3 1487.534 Mbytes/sec (1421650952192 bytes in 955.709654 secs)
  0000:2d:00.0 3 -> 2 1487.534 Mbytes/sec (1421650952192 bytes in 955.709654 secs)
  0000:36:00.0 0 -> 1 818.598 Mbytes/sec (782824898560 bytes in 956.299334 secs)
  0000:36:00.0 1 -> 0 818.598 Mbytes/sec (782824898560 bytes in 956.299336 secs)

0000:15:00.0 0 -> 1 Test pattern verified in 268435456 words
0000:15:00.0 1 -> 0 Test pattern verified in 268435456 words
0000:2d:00.0 0 -> 1 Test pattern verified in 268435456 words
0000:2d:00.0 1 -> 0 Test pattern verified in 268435456 words
0000:2d:00.0 2 -> 3 Test pattern verified in 268435456 words
0000:2d:00.0 3 -> 2 Test pattern verified in 268435456 words
0000:36:00.0 0 -> 1 Test pattern verified in 268435456 words
0000:36:00.0 1 -> 0 Test pattern verified in 268435456 words

Overall PASS

Result with only one pair of streams tested on each device:

[mr_halfword@skylake-alma release]$ xilinx_dma_bridge_for_pcie/test_dma_bridge_parallel_streams --stream_device 0000:15:00.0,0 --stream_device 0000:2d:00.0,0 --stream_device 0000:36:00.0,0
Opening device 0000:15:00.0 (10ee:7024) with IOMMU group 41
Enabled bus master for 0000:15:00.0
Opening device 0000:2d:00.0 (10ee:9038) with IOMMU group 85
Enabled bus master for 0000:2d:00.0
Opening device 0000:36:00.0 (10ee:7024) with IOMMU group 86
Enabled bus master for 0000:36:00.0
Using num_descriptors=64 bytes_per_buffer=0x1000000 data_mapping_size_words=0x10000000
Selecting test of TEF1001_dma_stream_loopback design PCI device 0000:15:00.0 IOMMU group 41 H2C channel 0 C2H channel 1
Selecting test of XCKU5P_DUAL_QSFP_dma_stream_loopback design PCI device 0000:2d:00.0 IOMMU group 85 H2C channel 0 C2H channel 1
Selecting test of NiteFury_dma_stream_loopback design PCI device 0000:36:00.0 IOMMU group 86 H2C channel 0 C2H channel 1
Press Ctrl-C to stop test
  0000:15:00.0 0 -> 1 1637.195 Mbytes/sec (16357785600 bytes in 9.991346 secs)
  0000:2d:00.0 0 -> 1 5952.485 Mbytes/sec (59508785152 bytes in 9.997302 secs)
  0000:36:00.0 0 -> 1 1637.144 Mbytes/sec (16357785600 bytes in 9.991661 secs)
<<snip>>
  0000:15:00.0 0 -> 1 1637.195 Mbytes/sec (16357785600 bytes in 9.991348 secs)
  0000:2d:00.0 0 -> 1 5952.636 Mbytes/sec (59525562368 bytes in 9.999866 secs)
  0000:36:00.0 0 -> 1 1637.189 Mbytes/sec (16374562816 bytes in 10.001631 secs)

^C  0000:15:00.0 0 -> 1 1637.194 Mbytes/sec (16374562816 bytes in 10.001600 secs)
  0000:2d:00.0 0 -> 1 5952.540 Mbytes/sec (59525562368 bytes in 10.000027 secs)
  0000:36:00.0 0 -> 1 1637.179 Mbytes/sec (16374562816 bytes in 10.001693 secs)

  0000:15:00.0 0 -> 1 1304.633 Mbytes/sec (855638016 bytes in 0.655846 secs)
  0000:2d:00.0 0 -> 1 1301.989 Mbytes/sec (234881024 bytes in 0.180402 secs)
  0000:36:00.0 0 -> 1 1279.002 Mbytes/sec (838860800 bytes in 0.655871 secs)

Overall test statistics:
  0000:15:00.0 0 -> 1 1637.198 Mbytes/sec (1408833159168 bytes in 860.515055 secs)
  0000:2d:00.0 0 -> 1 5952.659 Mbytes/sec (5119517130752 bytes in 860.038640 secs)
  0000:36:00.0 0 -> 1 1637.195 Mbytes/sec (1408816381952 bytes in 860.506279 secs)

0000:15:00.0 0 -> 1 Test pattern verified in 268435456 words
0000:2d:00.0 0 -> 1 Test pattern verified in 268435456 words
0000:36:00.0 0 -> 1 Test pattern verified in 268435456 words

Overall PASS

2. Intel DH67BL motherboard

An Intel DH67BL motherboard with a i5-2310 CPU. Running AlmaLinux 8.10.

Using a Kintex 7 160T FPGA with a PCIe x4 5 GT/s interface.

2.1. TOSING_160T_dma_stream_loopback

2.1.1. x4 2.5 GT/s

Run without the VFIO manager. With this design in the PC the re-enumeration which happens as a result of the PCIe Hot Reset after a VFIO open results in the test running at reduced link speed of 2.5 GT/s:

[mr_halfword@haswell-alma release]$ xilinx_dma_bridge_for_pcie/test_dma_bridge_independent_streams --stream_mapping_size 020000000 
Opening device 0000:01:00.0 (10ee:7024) with IOMMU group 0
Enabled bus master for 0000:01:00.0
Warning: Device device 0000:01:00.0 (10ee:7024) has reduced bandwidth
         Max width x4 speed 5 GT/s. Negotiated width x4 speed 2.5 GT/s
Using num_descriptors=64 bytes_per_buffer=0x10000 data_mapping_size_words=0x100000
Device 0000:01:00.0 design TOSING_160T_dma_stream_loopback routes updated
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 H2C channel 0
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 H2C channel 1
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 C2H channel 0
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 C2H channel 1
Opened DMA MEM device : /dev/cmem (dev_desc = 0x00000007)
Debug: mmap param length 0x2000, Addr: 0x100000000 
Buff num 0: Phys addr : 0x100000000 User Addr: 0x7f94a7d8c000 
Debug: mmap param length 0x400000, Addr: 0x100002000 
Buff num 0: Phys addr : 0x100002000 User Addr: 0x7f94a6f67000 
Debug: mmap param length 0x2000, Addr: 0x100402000 
Buff num 0: Phys addr : 0x100402000 User Addr: 0x7f94a7d8a000 
Debug: mmap param length 0x400000, Addr: 0x100404000 
Buff num 0: Phys addr : 0x100404000 User Addr: 0x7f94a6b67000 
Debug: mmap param length 0x2000, Addr: 0x100804000 
Buff num 0: Phys addr : 0x100804000 User Addr: 0x7f94a7d88000 
Debug: mmap param length 0x400000, Addr: 0x100806000 
Buff num 0: Phys addr : 0x100806000 User Addr: 0x7f94a6767000 
Debug: mmap param length 0x2000, Addr: 0x100c06000 
Buff num 0: Phys addr : 0x100c06000 User Addr: 0x7f94a7d86000 
Debug: mmap param length 0x400000, Addr: 0x100c08000 
Buff num 0: Phys addr : 0x100c08000 User Addr: 0x7f94a6367000 
Press Ctrl-C to stop test
  0000:01:00.0 H2C channel 0 381.378 Mbytes/sec (3813736448 bytes in 9.999882 secs)
  0000:01:00.0 H2C channel 1 381.378 Mbytes/sec (3813736448 bytes in 9.999889 secs)
  0000:01:00.0 C2H channel 0 381.378 Mbytes/sec (3813736448 bytes in 9.999888 secs)
  0000:01:00.0 C2H channel 1 381.378 Mbytes/sec (3813736448 bytes in 9.999880 secs)

  0000:01:00.0 H2C channel 0 381.269 Mbytes/sec (3812687872 bytes in 9.999984 secs)
  0000:01:00.0 H2C channel 1 381.269 Mbytes/sec (3812687872 bytes in 9.999984 secs)
  0000:01:00.0 C2H channel 0 381.269 Mbytes/sec (3812687872 bytes in 9.999984 secs)
  0000:01:00.0 C2H channel 1 381.269 Mbytes/sec (3812687872 bytes in 9.999984 secs)

  0000:01:00.0 H2C channel 0 381.399 Mbytes/sec (3813998592 bytes in 10.000013 secs)
  0000:01:00.0 H2C channel 1 381.399 Mbytes/sec (3813998592 bytes in 10.000013 secs)
  0000:01:00.0 C2H channel 0 381.399 Mbytes/sec (3813998592 bytes in 10.000013 secs)
  0000:01:00.0 C2H channel 1 381.399 Mbytes/sec (3813998592 bytes in 10.000013 secs)

^C  0000:01:00.0 H2C channel 0 381.292 Mbytes/sec (3575513088 bytes in 9.377370 secs)
  0000:01:00.0 H2C channel 1 381.292 Mbytes/sec (3575513088 bytes in 9.377366 secs)
  0000:01:00.0 C2H channel 0 381.292 Mbytes/sec (3575513088 bytes in 9.377366 secs)
  0000:01:00.0 C2H channel 1 381.292 Mbytes/sec (3575513088 bytes in 9.377370 secs)

Overall test statistics:
  0000:01:00.0 H2C channel 0 381.335 Mbytes/sec (15015936000 bytes in 39.377249 secs)
  0000:01:00.0 H2C channel 1 381.335 Mbytes/sec (15015936000 bytes in 39.377252 secs)
  0000:01:00.0 C2H channel 0 381.335 Mbytes/sec (15015936000 bytes in 39.377251 secs)
  0000:01:00.0 C2H channel 1 381.335 Mbytes/sec (15015936000 bytes in 39.377247 secs)

Memory Driver closed 

Overall PASS

2.1.2. x4 5 GT/s

Start the VFIO manager, which is unable to check the device speed as only initially opens the IOMMU group:

[mr_halfword@haswell-alma release]$ vfio_access/vfio_multi_process_manager 
Opening device 0000:01:00.0 (10ee:7024) with IOMMU group 0
Enabled bus master for 0000:01:00.0

Set the default AXI switch routing, and leave the device open. This trigges the VFIO manager to actually open the device and since has opened the device reports a reduced link speed:

[mr_halfword@haswell-alma release]$ xilinx_axi_stream_switch/xilinx_axi_stream_switch_set_routing --default_routing --pause_before_vfio_close
Warning: Device device 0000:01:00.0 (10ee:7024) has reduced bandwidth
         Max width x4 speed 5 GT/s. Negotiated width x4 speed 2.5 GT/s
Device 0000:01:00.0 design TOSING_160T_dma_stream_loopback routes updated
Routes processed in 1 devices. Press return to close the VFIO devices.

Run a program to trigger a re-train for a 5 GT/s link speed, while the VFIO device remains open:

[mr_halfword@haswell-alma release]$ sudo dump_info/pcie_set_speed_libpciaccess 0000:01:00.0 2
[sudo] password for mr_halfword: 
Operating on device 0000:00:01.0 vendor_id=8086 (Intel Corporation) device_id=0101 (Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port) revision_id=09
Link capabilities: 02212D02 Max link speed 5 GT/s max link width x16
Link status: 5041
Current link speed: 2.5 GT/s
Original link control 2: 0002
Original target link speed: 2 (5 GT/s)
New target link speed: 2 (5 GT/s)
New link control 2: 0002
Triggering link retraining by changing link control 0040 -> 0060
Link status: 5042
Current link speed: 5 GT/s

RUn the DMA test again. This time no warning about a link reduced speed, and get .9 times the test throughput than at the lower link speed:

[mr_halfword@haswell-alma release]$ xilinx_dma_bridge_for_pcie/test_dma_bridge_independent_streams --stream_mapping_size 020000000 
Using num_descriptors=64 bytes_per_buffer=0x10000 data_mapping_size_words=0x100000
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 H2C channel 0
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 H2C channel 1
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 C2H channel 0
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 C2H channel 1
Opened DMA MEM device : /dev/cmem (dev_desc = 0x00000007)
Debug: mmap param length 0x2000, Addr: 0x100000000 
Buff num 0: Phys addr : 0x100000000 User Addr: 0x7fc1cd923000 
Debug: mmap param length 0x400000, Addr: 0x100002000 
Buff num 0: Phys addr : 0x100002000 User Addr: 0x7fc1ccafe000 
Debug: mmap param length 0x2000, Addr: 0x100402000 
Buff num 0: Phys addr : 0x100402000 User Addr: 0x7fc1cd921000 
Debug: mmap param length 0x400000, Addr: 0x100404000 
Buff num 0: Phys addr : 0x100404000 User Addr: 0x7fc1cc6fe000 
Debug: mmap param length 0x2000, Addr: 0x100804000 
Buff num 0: Phys addr : 0x100804000 User Addr: 0x7fc1cd91f000 
Debug: mmap param length 0x400000, Addr: 0x100806000 
Buff num 0: Phys addr : 0x100806000 User Addr: 0x7fc1cc2fe000 
Debug: mmap param length 0x2000, Addr: 0x100c06000 
Buff num 0: Phys addr : 0x100c06000 User Addr: 0x7fc1cd91d000 
Debug: mmap param length 0x400000, Addr: 0x100c08000 
Buff num 0: Phys addr : 0x100c08000 User Addr: 0x7fc1cbefe000 
Press Ctrl-C to stop test
  0000:01:00.0 H2C channel 0 739.325 Mbytes/sec (7393247232 bytes in 9.999999 secs)
  0000:01:00.0 H2C channel 1 739.324 Mbytes/sec (7393181696 bytes in 9.999916 secs)
  0000:01:00.0 C2H channel 0 739.324 Mbytes/sec (7393181696 bytes in 9.999926 secs)
  0000:01:00.0 C2H channel 1 739.324 Mbytes/sec (7393181696 bytes in 9.999920 secs)

  0000:01:00.0 H2C channel 0 739.574 Mbytes/sec (7395737600 bytes in 9.999992 secs)
  0000:01:00.0 H2C channel 1 739.574 Mbytes/sec (7395803136 bytes in 10.000080 secs)
  0000:01:00.0 C2H channel 0 739.574 Mbytes/sec (7395737600 bytes in 9.999992 secs)
  0000:01:00.0 C2H channel 1 739.574 Mbytes/sec (7395737600 bytes in 9.999992 secs)

  0000:01:00.0 H2C channel 0 739.406 Mbytes/sec (7394033664 bytes in 9.999970 secs)
  0000:01:00.0 H2C channel 1 739.406 Mbytes/sec (7394033664 bytes in 9.999970 secs)
  0000:01:00.0 C2H channel 0 739.406 Mbytes/sec (7394099200 bytes in 10.000058 secs)
  0000:01:00.0 C2H channel 1 739.406 Mbytes/sec (7394099200 bytes in 10.000058 secs)

^C  0000:01:00.0 H2C channel 0 738.887 Mbytes/sec (5948309504 bytes in 8.050365 secs)
  0000:01:00.0 H2C channel 1 738.887 Mbytes/sec (5948309504 bytes in 8.050364 secs)
  0000:01:00.0 C2H channel 0 738.887 Mbytes/sec (5948309504 bytes in 8.050361 secs)
  0000:01:00.0 C2H channel 1 738.887 Mbytes/sec (5948309504 bytes in 8.050364 secs)

Overall test statistics:
  0000:01:00.0 H2C channel 0 739.319 Mbytes/sec (28131328000 bytes in 38.050325 secs)
  0000:01:00.0 H2C channel 1 739.319 Mbytes/sec (28131328000 bytes in 38.050330 secs)
  0000:01:00.0 C2H channel 0 739.319 Mbytes/sec (28131328000 bytes in 38.050337 secs)
  0000:01:00.0 C2H channel 1 739.319 Mbytes/sec (28131328000 bytes in 38.050334 secs)

Memory Driver closed 

Overall PASS

3. HP Z6 G4

An HP Z6 G4 with dual Intel(R) Xeon(R) Gold 6148 CPUs.

3.1. TOSING_160T_dma_stream_loopback

Booted openSUSE Leap 15.5 from a SD Card. The same FPGA card and bitstream as used in the Intel DH67BL motherboard. In this PC the FPGA PCIe interface operates at the expected x4 5 GT/s:

linux@DESKTOP-BVUMP11:~/fpga_sio/software_tests/eclipse_project/bin/release> xilinx_dma_bridge_for_pcie/test_dma_bridge_parallel_streams 
Opening device 0000:01:00.0 (10ee:7024) with IOMMU group 23
Enabled bus master for 0000:01:00.0
Using num_descriptors=64 bytes_per_buffer=0x1000000 data_mapping_size_words=0x10000000
Device 0000:01:00.0 design TOSING_160T_dma_stream_loopback routes updated
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 23 H2C channel 1 C2H channel 0
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 23 H2C channel 0 C2H channel 1
Press Ctrl-C to stop test
  0000:01:00.0 1 -> 0 742.314 Mbytes/sec (7415529472 bytes in 9.989745 secs)
  0000:01:00.0 0 -> 1 742.314 Mbytes/sec (7415529472 bytes in 9.989750 secs)

  0000:01:00.0 1 -> 0 742.317 Mbytes/sec (7415529472 bytes in 9.989714 secs)
  0000:01:00.0 0 -> 1 742.317 Mbytes/sec (7415529472 bytes in 9.989714 secs)

  0000:01:00.0 1 -> 0 742.317 Mbytes/sec (7432306688 bytes in 10.012315 secs)
  0000:01:00.0 0 -> 1 742.317 Mbytes/sec (7432306688 bytes in 10.012315 secs)

  0000:01:00.0 1 -> 0 742.317 Mbytes/sec (7415529472 bytes in 9.989713 secs)
  0000:01:00.0 0 -> 1 742.317 Mbytes/sec (7415529472 bytes in 9.989714 secs)

^C  0000:01:00.0 1 -> 0 742.317 Mbytes/sec (4898947072 bytes in 6.599539 secs)
  0000:01:00.0 0 -> 1 742.317 Mbytes/sec (4898947072 bytes in 6.599535 secs)

Overall test statistics:
  0000:01:00.0 1 -> 0 742.317 Mbytes/sec (34577842176 bytes in 46.580982 secs)
  0000:01:00.0 0 -> 1 742.317 Mbytes/sec (34577842176 bytes in 46.580984 secs)

0000:01:00.0 1 -> 0 Test pattern verified in 268435456 words
0000:01:00.0 0 -> 1 Test pattern verified in 268435456 words

Overall PASS

A dump of the PCIe information while the test is running:

linux@DESKTOP-BVUMP11:~/fpga_sio/software_tests/eclipse_project/bin/release> dump_info/dump_pci_info_pciutils 
domain=0000 bus=01 dev=00 func=00 rev=01
  vendor_id=10ee (Xilinx Corporation) device_id=7024 (Device 7024) subvendor_id=0002 subdevice_id=0009
  iommu_group=23
  driver=vfio-pci
  control: I/O- Mem+ BusMaster+ ParErr+ SERR+ DisINTx-
  status: INTx- <ParErr- >TAbort- <TAbort- <MAbort- >SERR- DetParErr-
  bar[0] base_addr=94300000 size=100000 is_IO=0 is_prefetchable=0 is_64=0
  bar[2] base_addr=94400000 size=10000 is_IO=0 is_prefetchable=0 is_64=1
  Capabilities: [40] Power Management
  Capabilities: [48] Message Signaled Interrupts
  Capabilities: [60] PCI Express v2 Express Endpoint, MSI 0
    Link capabilities: Max speed 5 GT/s Max width x4
    Negotiated link status: Current speed 5 GT/s Width x4
    Link capabilities2: Not implemented
    DevCap: MaxPayload 512 bytes PhantFunc 0 Latency L0s Maximum of 64 ns L1 No limit
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
    DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq-
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
    DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
    LnkCap: Port # 0 ASPM L0s
            L0s Exit Latency More than 4 μs
            L1 Exit Latency More than 64 μs
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
    LnkCtl: ASPM Disabled RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- ABWMgmt-
    LnkSta: TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
  domain=0000 bus=00 dev=1b func=00 rev=f9
    vendor_id=8086 (Intel Corporation) device_id=a1e7 (C620 Series Chipset Family PCI Express Root Port #17)
    iommu_group=18
    driver=pcieport
    physical_slot=3
    control: I/O+ Mem+ BusMaster+ ParErr+ SERR+ DisINTx+
    status: INTx- <ParErr- >TAbort- <TAbort- <MAbort- >SERR- DetParErr-
    Capabilities: [40] PCI Express v2 Root Port, MSI 0
      Link capabilities: Max speed 8 GT/s Max width x4
      Negotiated link status: Current speed 5 GT/s Width x4
      Link capabilities2: Supported link speeds 2.5 GT/s 5.0 GT/s 8.0 GT/s
      DevCap: MaxPayload 128 bytes PhantFunc 0 Latency L0s Maximum of 64 ns L1 Maximum of 1 μs
              ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
      DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq-
              RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
      DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
      LnkCap: Port # 17 ASPM not supported
              L0s Exit Latency 512 ns to less than 1 μs
              L1 Exit Latency 8 μs to less than 16 μs
              ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
      LnkCtl: ASPM Disabled RCB 64 bytes Disabled- CommClk+
              ExtSynch- ClockPM- AutWidDis- BWInt- ABWMgmt-
      LnkSta: TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
      SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
              Slot #3 PowerLimit 25.000W Interlock- NoCompl+
    Capabilities: [80] Message Signaled Interrupts
    Capabilities: [90] Bridge subsystem vendor/device ID
    Capabilities: [a0] Power Management

3.2. TOSING_160T_dma_ddr3

Run without error:

linux@DESKTOP-BVUMP11:~/fpga_sio/software_tests/eclipse_project/bin/release> xilinx_dma_bridge_for_pcie/test_dma_bridge
Opening device 0000:01:00.0 (10ee:7024) with IOMMU group 23
Enabled bus master for 0000:01:00.0
Testing TOSING_160T_dma_ddr3 design with memory size 0x40000000
PCI device 0000:01:00.0 IOMMU group 23

Testing using 8 buffers of size 0x8000000 bytes, H2C channel 0 C2H channel 0
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3636.431944 (Mbytes/sec)
  Mean = 3645.796235 (Mbytes/sec)
   Max = 3653.293751 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 945.373122 (Mbytes/sec)
  Mean = 945.417685 (Mbytes/sec)
   Max = 945.449435 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3603.803932 (Mbytes/sec)
  Mean = 3624.287538 (Mbytes/sec)
   Max = 3644.648570 (Mbytes/sec)
TEST PASS

Testing using 8 buffers of size 0x8000000 bytes, H2C channel 0 C2H channel 1
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3641.184135 (Mbytes/sec)
  Mean = 3647.979870 (Mbytes/sec)
   Max = 3653.243074 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 945.375525 (Mbytes/sec)
  Mean = 945.424984 (Mbytes/sec)
   Max = 945.452615 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3624.245370 (Mbytes/sec)
  Mean = 3637.930887 (Mbytes/sec)
   Max = 3649.101458 (Mbytes/sec)
TEST PASS

Testing using 8 buffers of size 0x8000000 bytes, H2C channel 1 C2H channel 0
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3643.943919 (Mbytes/sec)
  Mean = 3649.734438 (Mbytes/sec)
   Max = 3659.238510 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 945.392312 (Mbytes/sec)
  Mean = 945.426304 (Mbytes/sec)
   Max = 945.450170 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3633.495611 (Mbytes/sec)
  Mean = 3640.852926 (Mbytes/sec)
   Max = 3647.119112 (Mbytes/sec)
TEST PASS

Testing using 8 buffers of size 0x8000000 bytes, H2C channel 1 C2H channel 1
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3635.526317 (Mbytes/sec)
  Mean = 3648.811077 (Mbytes/sec)
   Max = 3658.089982 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 945.398097 (Mbytes/sec)
  Mean = 945.429439 (Mbytes/sec)
   Max = 945.455510 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3632.779141 (Mbytes/sec)
  Mean = 3642.108668 (Mbytes/sec)
   Max = 3652.501426 (Mbytes/sec)
TEST PASS
Testing TOSING_160T_dma_ddr3 design with memory size 0x40000000
PCI device 0000:01:00.0 IOMMU group 23

Testing using:
  H2C channel 0 transfer length 0x10000000 bytes with 8 descriptors
  C2H channel 0 transfer length 0x10000000 bytes with 8 descriptors
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3643.670914 (Mbytes/sec)
  Mean = 3650.617295 (Mbytes/sec)
   Max = 3658.552602 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 906.966307 (Mbytes/sec)
  Mean = 907.009465 (Mbytes/sec)
   Max = 907.036223 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3594.302758 (Mbytes/sec)
  Mean = 3602.093865 (Mbytes/sec)
   Max = 3604.951140 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 0 transfer length 0x10000000 bytes with 8 descriptors
  C2H channel 1 transfer length 0x10000000 bytes with 8 descriptors
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3644.902976 (Mbytes/sec)
  Mean = 3650.843228 (Mbytes/sec)
   Max = 3657.502315 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 906.974082 (Mbytes/sec)
  Mean = 907.010500 (Mbytes/sec)
   Max = 907.028802 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3594.450166 (Mbytes/sec)
  Mean = 3602.873753 (Mbytes/sec)
   Max = 3605.441143 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 1 transfer length 0x10000000 bytes with 8 descriptors
  C2H channel 0 transfer length 0x10000000 bytes with 8 descriptors
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3645.870374 (Mbytes/sec)
  Mean = 3652.490617 (Mbytes/sec)
   Max = 3658.809515 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 906.962397 (Mbytes/sec)
  Mean = 907.013354 (Mbytes/sec)
   Max = 907.033871 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3593.929644 (Mbytes/sec)
  Mean = 3603.571980 (Mbytes/sec)
   Max = 3605.700360 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 1 transfer length 0x10000000 bytes with 8 descriptors
  C2H channel 1 transfer length 0x10000000 bytes with 8 descriptors
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3645.345015 (Mbytes/sec)
  Mean = 3651.430892 (Mbytes/sec)
   Max = 3657.803938 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 906.984469 (Mbytes/sec)
  Mean = 907.016453 (Mbytes/sec)
   Max = 907.030472 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3593.201371 (Mbytes/sec)
  Mean = 3602.478031 (Mbytes/sec)
   Max = 3605.229510 (Mbytes/sec)
TEST PASS

Overall PASS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment