Skip to content

Instantly share code, notes, and snippets.

@Chester-Gillon
Last active December 22, 2025 19:20
Show Gist options
  • Select an option

  • Save Chester-Gillon/cf17eb97c63e754de4e06bc6552ffdb7 to your computer and use it in GitHub Desktop.

Select an option

Save Chester-Gillon/cf17eb97c63e754de4e06bc6552ffdb7 to your computer and use it in GitHub Desktop.
PCIe DMA bandwidth

0. Introduction

Contains some notes about achieved PCIe DMA bandwidth using FPGAs, using fpga_sio for the FPGA and test software.

1. HP Z4 G4

A HP Z4 G4 with an Intel W-2123 CPU. Running AlmaLinux 8.10

Unless specified otherwise using 3 FPGAs fitted:

  • NiteFury : Artix-7 PCIe2 x4
  • TEF1001 : Kintex-7 PCIe2 x4
  • XCKU5P_DUAL_QSFP : Kintex UltraScale+ PCIe3 x8

1.1 dma_stream_loopback direct

Uses the following bitstreams, in which the loopback is connected directly on the DMA bridge, using fixed connections between adjacent channels:

$ bin/release/identify_pcie_fpga_design/display_identified_pcie_fpga_designs 
Opening device 0000:15:00.0 (10ee:7024) with IOMMU group 41
Enabled bus master for 0000:15:00.0
Opening device 0000:2d:00.0 (10ee:9038) with IOMMU group 85
Enabled bus master for 0000:2d:00.0
Opening device 0000:36:00.0 (10ee:7024) with IOMMU group 86
Enabled bus master for 0000:36:00.0

Design TEF1001_dma_stream_loopback:
  PCI device 0000:15:00.0 IOMMU group 41
  DMA bridge bar 2 AXI Stream
  Channel ID  addr_alignment  len_granularity  num_address_bits
       H2C 0               1                1                64
       H2C 1               1                1                64
       C2H 0               1                1                64
       C2H 1               1                1                64
  User access build timestamp : C1310413 - 24/02/2024 16:16:19
  Quad SPI registers at bar 0 offset 0x2000
  XADC registers at bar 0 offset 0x3000
  IIC registers at bar 0 offset 0x0
  bit-banged I2C GPIO registers at bar 0 offset 0x1000
  All 2 master ports in AXI4-Stream Switch are disabled

Design XCKU5P_DUAL_QSFP_dma_stream_loopback:
  PCI device 0000:2d:00.0 IOMMU group 85
  DMA bridge bar 2 AXI Stream
  Channel ID  addr_alignment  len_granularity  num_address_bits
       H2C 0               1                1                64
       H2C 1               1                1                64
       H2C 2               1                1                64
       H2C 3               1                1                64
       C2H 0               1                1                64
       C2H 1               1                1                64
       C2H 2               1                1                64
       C2H 3               1                1                64
  User access build timestamp : F5B0E950 - 30/11/2024 14:37:16
  Quad SPI registers at bar 0 offset 0x0
  SYSMON registers at bar 0 offset 0x1000
  All 4 master ports in AXI4-Stream Switch are disabled

Design NiteFury_dma_stream_loopback:
  PCI device 0000:36:00.0 IOMMU group 86
  DMA bridge bar 2 AXI Stream
  Channel ID  addr_alignment  len_granularity  num_address_bits
       H2C 0               1                1                64
       H2C 1               1                1                64
       C2H 0               1                1                64
       C2H 1               1                1                64
  User access build timestamp : C1310A5B - 24/02/2024 16:41:27
  Quad SPI registers at bar 0 offset 0x0
  XADC registers at bar 0 offset 0x1000
  All 2 master ports in AXI4-Stream Switch are disabled

While the FPGA designs don't contain the AXI4-Stream Switch, the values read from the undefined reads had the most significant bit set, which resulted in all the ports being reported as disabled.

Result with all streams tested in parallel:

[mr_halfword@skylake-alma release]$ xilinx_dma_bridge_for_pcie/test_dma_bridge_parallel_streams 
Opening device 0000:15:00.0 (10ee:7024) with IOMMU group 41
Enabled bus master for 0000:15:00.0
Opening device 0000:2d:00.0 (10ee:9038) with IOMMU group 85
Enabled bus master for 0000:2d:00.0
Opening device 0000:36:00.0 (10ee:7024) with IOMMU group 86
Enabled bus master for 0000:36:00.0
Using num_descriptors=64 bytes_per_buffer=0x1000000 data_mapping_size_words=0x10000000
Selecting test of TEF1001_dma_stream_loopback design PCI device 0000:15:00.0 IOMMU group 41 H2C channel 0 C2H channel 1
Selecting test of TEF1001_dma_stream_loopback design PCI device 0000:15:00.0 IOMMU group 41 H2C channel 1 C2H channel 0
Selecting test of XCKU5P_DUAL_QSFP_dma_stream_loopback design PCI device 0000:2d:00.0 IOMMU group 85 H2C channel 0 C2H channel 1
Selecting test of XCKU5P_DUAL_QSFP_dma_stream_loopback design PCI device 0000:2d:00.0 IOMMU group 85 H2C channel 1 C2H channel 0
Selecting test of XCKU5P_DUAL_QSFP_dma_stream_loopback design PCI device 0000:2d:00.0 IOMMU group 85 H2C channel 2 C2H channel 3
Selecting test of XCKU5P_DUAL_QSFP_dma_stream_loopback design PCI device 0000:2d:00.0 IOMMU group 85 H2C channel 3 C2H channel 2
Selecting test of NiteFury_dma_stream_loopback design PCI device 0000:36:00.0 IOMMU group 86 H2C channel 0 C2H channel 1
Selecting test of NiteFury_dma_stream_loopback design PCI device 0000:36:00.0 IOMMU group 86 H2C channel 1 C2H channel 0
Press Ctrl-C to stop test
  0000:15:00.0 0 -> 1 818.597 Mbytes/sec (8170504192 bytes in 9.981102 secs)
  0000:15:00.0 1 -> 0 818.597 Mbytes/sec (8170504192 bytes in 9.981107 secs)
  0000:2d:00.0 0 -> 1 1487.514 Mbytes/sec (14864613376 bytes in 9.992921 secs)
  0000:2d:00.0 1 -> 0 1487.513 Mbytes/sec (14864613376 bytes in 9.992929 secs)
  0000:2d:00.0 2 -> 3 1487.513 Mbytes/sec (14864613376 bytes in 9.992930 secs)
  0000:2d:00.0 3 -> 2 1487.513 Mbytes/sec (14864613376 bytes in 9.992931 secs)
  0000:36:00.0 0 -> 1 818.584 Mbytes/sec (8170504192 bytes in 9.981270 secs)
  0000:36:00.0 1 -> 0 818.583 Mbytes/sec (8170504192 bytes in 9.981276 secs)
<<snip>>
  0000:15:00.0 0 -> 1 818.598 Mbytes/sec (8187281408 bytes in 10.001590 secs)
  0000:15:00.0 1 -> 0 818.598 Mbytes/sec (8187281408 bytes in 10.001589 secs)
  0000:2d:00.0 0 -> 1 1487.519 Mbytes/sec (14864613376 bytes in 9.992892 secs)
  0000:2d:00.0 1 -> 0 1487.519 Mbytes/sec (14864613376 bytes in 9.992892 secs)
  0000:2d:00.0 2 -> 3 1487.519 Mbytes/sec (14864613376 bytes in 9.992893 secs)
  0000:2d:00.0 3 -> 2 1487.519 Mbytes/sec (14864613376 bytes in 9.992893 secs)
  0000:36:00.0 0 -> 1 818.594 Mbytes/sec (8187281408 bytes in 10.001636 secs)
  0000:36:00.0 1 -> 0 818.594 Mbytes/sec (8187281408 bytes in 10.001636 secs)

^C  0000:15:00.0 0 -> 1 818.596 Mbytes/sec (5167382528 bytes in 6.312490 secs)
  0000:15:00.0 1 -> 0 818.597 Mbytes/sec (5167382528 bytes in 6.312486 secs)
  0000:2d:00.0 0 -> 1 1487.491 Mbytes/sec (8506048512 bytes in 5.718388 secs)
  0000:2d:00.0 1 -> 0 1487.491 Mbytes/sec (8506048512 bytes in 5.718386 secs)
  0000:2d:00.0 2 -> 3 1487.491 Mbytes/sec (8506048512 bytes in 5.718385 secs)
  0000:2d:00.0 3 -> 2 1487.491 Mbytes/sec (8506048512 bytes in 5.718385 secs)
  0000:36:00.0 0 -> 1 818.590 Mbytes/sec (5167382528 bytes in 6.312539 secs)
  0000:36:00.0 1 -> 0 818.591 Mbytes/sec (5167382528 bytes in 6.312535 secs)

Overall test statistics:
  0000:15:00.0 0 -> 1 818.599 Mbytes/sec (782824898560 bytes in 956.298503 secs)
  0000:15:00.0 1 -> 0 818.599 Mbytes/sec (782824898560 bytes in 956.298504 secs)
  0000:2d:00.0 0 -> 1 1487.534 Mbytes/sec (1421650952192 bytes in 955.709649 secs)
  0000:2d:00.0 1 -> 0 1487.534 Mbytes/sec (1421650952192 bytes in 955.709654 secs)
  0000:2d:00.0 2 -> 3 1487.534 Mbytes/sec (1421650952192 bytes in 955.709654 secs)
  0000:2d:00.0 3 -> 2 1487.534 Mbytes/sec (1421650952192 bytes in 955.709654 secs)
  0000:36:00.0 0 -> 1 818.598 Mbytes/sec (782824898560 bytes in 956.299334 secs)
  0000:36:00.0 1 -> 0 818.598 Mbytes/sec (782824898560 bytes in 956.299336 secs)

0000:15:00.0 0 -> 1 Test pattern verified in 268435456 words
0000:15:00.0 1 -> 0 Test pattern verified in 268435456 words
0000:2d:00.0 0 -> 1 Test pattern verified in 268435456 words
0000:2d:00.0 1 -> 0 Test pattern verified in 268435456 words
0000:2d:00.0 2 -> 3 Test pattern verified in 268435456 words
0000:2d:00.0 3 -> 2 Test pattern verified in 268435456 words
0000:36:00.0 0 -> 1 Test pattern verified in 268435456 words
0000:36:00.0 1 -> 0 Test pattern verified in 268435456 words

Overall PASS

Result with only one pair of streams tested on each device:

[mr_halfword@skylake-alma release]$ xilinx_dma_bridge_for_pcie/test_dma_bridge_parallel_streams --stream_device 0000:15:00.0,0 --stream_device 0000:2d:00.0,0 --stream_device 0000:36:00.0,0
Opening device 0000:15:00.0 (10ee:7024) with IOMMU group 41
Enabled bus master for 0000:15:00.0
Opening device 0000:2d:00.0 (10ee:9038) with IOMMU group 85
Enabled bus master for 0000:2d:00.0
Opening device 0000:36:00.0 (10ee:7024) with IOMMU group 86
Enabled bus master for 0000:36:00.0
Using num_descriptors=64 bytes_per_buffer=0x1000000 data_mapping_size_words=0x10000000
Selecting test of TEF1001_dma_stream_loopback design PCI device 0000:15:00.0 IOMMU group 41 H2C channel 0 C2H channel 1
Selecting test of XCKU5P_DUAL_QSFP_dma_stream_loopback design PCI device 0000:2d:00.0 IOMMU group 85 H2C channel 0 C2H channel 1
Selecting test of NiteFury_dma_stream_loopback design PCI device 0000:36:00.0 IOMMU group 86 H2C channel 0 C2H channel 1
Press Ctrl-C to stop test
  0000:15:00.0 0 -> 1 1637.195 Mbytes/sec (16357785600 bytes in 9.991346 secs)
  0000:2d:00.0 0 -> 1 5952.485 Mbytes/sec (59508785152 bytes in 9.997302 secs)
  0000:36:00.0 0 -> 1 1637.144 Mbytes/sec (16357785600 bytes in 9.991661 secs)
<<snip>>
  0000:15:00.0 0 -> 1 1637.195 Mbytes/sec (16357785600 bytes in 9.991348 secs)
  0000:2d:00.0 0 -> 1 5952.636 Mbytes/sec (59525562368 bytes in 9.999866 secs)
  0000:36:00.0 0 -> 1 1637.189 Mbytes/sec (16374562816 bytes in 10.001631 secs)

^C  0000:15:00.0 0 -> 1 1637.194 Mbytes/sec (16374562816 bytes in 10.001600 secs)
  0000:2d:00.0 0 -> 1 5952.540 Mbytes/sec (59525562368 bytes in 10.000027 secs)
  0000:36:00.0 0 -> 1 1637.179 Mbytes/sec (16374562816 bytes in 10.001693 secs)

  0000:15:00.0 0 -> 1 1304.633 Mbytes/sec (855638016 bytes in 0.655846 secs)
  0000:2d:00.0 0 -> 1 1301.989 Mbytes/sec (234881024 bytes in 0.180402 secs)
  0000:36:00.0 0 -> 1 1279.002 Mbytes/sec (838860800 bytes in 0.655871 secs)

Overall test statistics:
  0000:15:00.0 0 -> 1 1637.198 Mbytes/sec (1408833159168 bytes in 860.515055 secs)
  0000:2d:00.0 0 -> 1 5952.659 Mbytes/sec (5119517130752 bytes in 860.038640 secs)
  0000:36:00.0 0 -> 1 1637.195 Mbytes/sec (1408816381952 bytes in 860.506279 secs)

0000:15:00.0 0 -> 1 Test pattern verified in 268435456 words
0000:2d:00.0 0 -> 1 Test pattern verified in 268435456 words
0000:36:00.0 0 -> 1 Test pattern verified in 268435456 words

Overall PASS

1.3. TEF1001_dma_stream_crc64 and NiteFury_dma_stream_crc64

When compiled for coverage:

[mr_halfword@skylake-alma coverage]$ xilinx_dma_bridge_for_pcie/crc64_stream_latency -t
Opening device 0000:15:00.0 (10ee:7024) with IOMMU group 41
Enabled bus master for 0000:15:00.0
Opening device 0000:36:00.0 (10ee:7024) with IOMMU group 86
Enabled bus master for 0000:36:00.0
Testing design TEF1001_dma_stream_crc64 using C2H 0 -> H2C 0
Current temperature  40.2C  min  38.8C  max  42.1C
     32 len bytes latencies (us):   3.767 (50')   3.956 (75')   4.051 (99')  20.965 (99.999')
     64 len bytes latencies (us):   3.734 (50')   3.941 (75')   3.993 (99')  19.363 (99.999')
    128 len bytes latencies (us):   3.820 (50')   3.839 (75')   3.913 (99')  19.409 (99.999')
    256 len bytes latencies (us):   4.015 (50')   4.031 (75')   4.267 (99')  20.781 (99.999')
    512 len bytes latencies (us):   4.289 (50')   4.303 (75')   4.621 (99')  19.995 (99.999')
   1024 len bytes latencies (us):   4.575 (50')   4.595 (75')   4.829 (99')  22.938 (99.999')
   2048 len bytes latencies (us):   5.135 (50')   5.148 (75')   5.455 (99')  20.722 (99.999')
   4096 len bytes latencies (us):   6.266 (50')   6.284 (75')   6.583 (99')  23.757 (99.999')
   8192 len bytes latencies (us):   8.535 (50')   8.555 (75')   8.894 (99')  24.549 (99.999')
  16384 len bytes latencies (us):  13.137 (50')  13.160 (75')  13.518 (99')  29.723 (99.999')
  32768 len bytes latencies (us):  22.364 (50')  22.395 (75')  22.680 (99')  37.886 (99.999')
  65536 len bytes latencies (us):  40.807 (50')  40.840 (75')  41.182 (99')  54.720 (99.999')
 131072 len bytes latencies (us):  77.745 (50')  77.783 (75')  78.137 (99')  93.751 (99.999')
 262144 len bytes latencies (us): 151.543 (50') 151.601 (75') 151.905 (99') 166.448 (99.999')
 524288 len bytes latencies (us): 299.217 (50') 299.609 (75') 299.994 (99') 321.015 (99.999')
1048576 len bytes latencies (us): 594.232 (50') 594.354 (75') 594.723 (99') 612.605 (99.999')
Current temperature  40.2C  min  38.6C  max  42.3C
Testing design TEF1001_dma_stream_crc64 using C2H 1 -> H2C 1
     32 len bytes latencies (us):   3.679 (50')   3.976 (75')   4.026 (99')  22.006 (99.999')
     64 len bytes latencies (us):   3.649 (50')   3.661 (75')   4.029 (99')  19.129 (99.999')
    128 len bytes latencies (us):   3.763 (50')   3.906 (75')   3.945 (99')  18.678 (99.999')
    256 len bytes latencies (us):   4.009 (50')   4.025 (75')   4.266 (99')  20.383 (99.999')
    512 len bytes latencies (us):   4.290 (50')   4.303 (75')   4.616 (99')  20.778 (99.999')
   1024 len bytes latencies (us):   4.578 (50')   4.595 (75')   4.804 (99')  20.538 (99.999')
   2048 len bytes latencies (us):   5.133 (50')   5.145 (75')   5.355 (99')  20.879 (99.999')
   4096 len bytes latencies (us):   6.269 (50')   6.287 (75')   6.591 (99')  21.714 (99.999')
   8192 len bytes latencies (us):   8.537 (50')   8.557 (75')   8.888 (99')  24.256 (99.999')
  16384 len bytes latencies (us):  13.149 (50')  13.177 (75')  13.585 (99')  32.782 (99.999')
  32768 len bytes latencies (us):  22.374 (50')  22.408 (75')  22.680 (99')  40.568 (99.999')
  65536 len bytes latencies (us):  40.824 (50')  40.850 (75')  41.179 (99')  56.469 (99.999')
 131072 len bytes latencies (us):  77.743 (50')  77.780 (75')  78.133 (99')  93.989 (99.999')
 262144 len bytes latencies (us): 151.568 (50') 151.613 (75') 151.892 (99') 167.246 (99.999')
 524288 len bytes latencies (us): 299.528 (50') 299.612 (75') 300.110 (99') 316.262 (99.999')
1048576 len bytes latencies (us): 594.239 (50') 594.362 (75') 594.705 (99') 609.971 (99.999')
Current temperature  40.6C  min  38.6C  max  42.4C
Testing design NiteFury_dma_stream_crc64 using C2H 0 -> H2C 0
Current temperature  44.2C  min  42.4C  max  46.0C
     32 len bytes latencies (us):   3.701 (50')   3.975 (75')   4.014 (99')  19.392 (99.999')
     64 len bytes latencies (us):   3.757 (50')   3.929 (75')   3.971 (99')  19.570 (99.999')
    128 len bytes latencies (us):   3.822 (50')   3.833 (75')   3.903 (99')  19.629 (99.999')
    256 len bytes latencies (us):   4.020 (50')   4.032 (75')   4.251 (99')  19.593 (99.999')
    512 len bytes latencies (us):   4.300 (50')   4.317 (75')   4.612 (99')  20.360 (99.999')
   1024 len bytes latencies (us):   4.587 (50')   4.608 (75')   4.815 (99')  20.888 (99.999')
   2048 len bytes latencies (us):   5.141 (50')   5.171 (75')   5.468 (99')  24.910 (99.999')
   4096 len bytes latencies (us):   6.280 (50')   6.297 (75')   6.634 (99')  21.972 (99.999')
   8192 len bytes latencies (us):   8.536 (50')   8.553 (75')   8.900 (99')  26.733 (99.999')
  16384 len bytes latencies (us):  13.152 (50')  13.176 (75')  13.529 (99')  29.121 (99.999')
  32768 len bytes latencies (us):  22.367 (50')  22.391 (75')  22.589 (99')  39.588 (99.999')
  65536 len bytes latencies (us):  40.793 (50')  40.810 (75')  41.161 (99')  56.392 (99.999')
 131072 len bytes latencies (us):  77.651 (50')  77.668 (75')  78.054 (99')  95.755 (99.999')
 262144 len bytes latencies (us): 151.370 (50') 151.386 (75') 151.762 (99') 167.076 (99.999')
 524288 len bytes latencies (us): 299.158 (50') 299.343 (75') 299.529 (99') 314.365 (99.999')
1048576 len bytes latencies (us): 593.455 (50') 593.481 (75') 593.774 (99') 609.363 (99.999')
Current temperature  45.0C  min  42.4C  max  47.0C
Testing design NiteFury_dma_stream_crc64 using C2H 1 -> H2C 1
     32 len bytes latencies (us):   3.672 (50')   3.984 (75')   4.035 (99')  19.746 (99.999')
     64 len bytes latencies (us):   3.736 (50')   3.940 (75')   3.981 (99')  19.874 (99.999')
    128 len bytes latencies (us):   3.818 (50')   3.841 (75')   3.929 (99')  22.312 (99.999')
    256 len bytes latencies (us):   4.007 (50')   4.021 (75')   4.323 (99')  19.757 (99.999')
    512 len bytes latencies (us):   4.287 (50')   4.302 (75')   4.620 (99')  18.904 (99.999')
   1024 len bytes latencies (us):   4.571 (50')   4.592 (75')   4.820 (99')  21.832 (99.999')
   2048 len bytes latencies (us):   5.132 (50')   5.147 (75')   5.440 (99')  20.736 (99.999')
   4096 len bytes latencies (us):   6.261 (50')   6.278 (75')   6.488 (99')  22.158 (99.999')
   8192 len bytes latencies (us):   8.530 (50')   8.546 (75')   8.897 (99')  25.225 (99.999')
  16384 len bytes latencies (us):  13.146 (50')  13.169 (75')  13.546 (99')  29.030 (99.999')
  32768 len bytes latencies (us):  22.358 (50')  22.393 (75')  22.618 (99')  38.220 (99.999')
  65536 len bytes latencies (us):  40.779 (50')  40.797 (75')  41.154 (99')  58.882 (99.999')
 131072 len bytes latencies (us):  77.640 (50')  77.655 (75')  78.025 (99') 100.043 (99.999')
 262144 len bytes latencies (us): 151.361 (50') 151.378 (75') 151.629 (99') 167.397 (99.999')
 524288 len bytes latencies (us): 299.128 (50') 299.352 (75') 299.481 (99') 314.659 (99.999')
1048576 len bytes latencies (us): 593.440 (50') 593.477 (75') 593.809 (99') 609.361 (99.999')
Current temperature  45.4C  min  42.4C  max  47.2C

When compiled for release:

[mr_halfword@skylake-alma release]$ xilinx_dma_bridge_for_pcie/crc64_stream_latency -t
Opening device 0000:15:00.0 (10ee:7024) with IOMMU group 41
Enabled bus master for 0000:15:00.0
Opening device 0000:36:00.0 (10ee:7024) with IOMMU group 86
Enabled bus master for 0000:36:00.0
Testing design TEF1001_dma_stream_crc64 using C2H 0 -> H2C 0
Current temperature  40.9C  min  38.6C  max  42.0C
     32 len bytes latencies (us):   3.314 (50')   3.498 (75')   3.784 (99')  22.475 (99.999')
     64 len bytes latencies (us):   3.325 (50')   3.333 (75')   3.678 (99')  18.916 (99.999')
    128 len bytes latencies (us):   3.423 (50')   3.434 (75')   3.655 (99')  19.459 (99.999')
    256 len bytes latencies (us):   3.634 (50')   3.645 (75')   3.896 (99')  18.495 (99.999')
    512 len bytes latencies (us):   3.905 (50')   3.917 (75')   4.254 (99')  19.783 (99.999')
   1024 len bytes latencies (us):   4.182 (50')   4.198 (75')   4.552 (99')  19.816 (99.999')
   2048 len bytes latencies (us):   4.739 (50')   4.751 (75')   5.083 (99')  19.453 (99.999')
   4096 len bytes latencies (us):   5.880 (50')   5.893 (75')   6.243 (99')  21.690 (99.999')
   8192 len bytes latencies (us):   8.149 (50')   8.165 (75')   8.493 (99')  23.902 (99.999')
  16384 len bytes latencies (us):  12.764 (50')  12.795 (75')  13.140 (99')  28.406 (99.999')
  32768 len bytes latencies (us):  21.978 (50')  22.006 (75')  22.364 (99')  37.095 (99.999')
  65536 len bytes latencies (us):  40.410 (50')  40.443 (75')  40.783 (99')  56.135 (99.999')
 131072 len bytes latencies (us):  77.290 (50')  77.340 (75')  77.643 (99')  92.961 (99.999')
 262144 len bytes latencies (us): 151.068 (50') 151.130 (75') 151.486 (99') 166.409 (99.999')
 524288 len bytes latencies (us): 298.878 (50') 299.048 (75') 299.442 (99') 314.094 (99.999')
1048576 len bytes latencies (us): 592.885 (50') 593.736 (75') 594.144 (99') 609.116 (99.999')
Current temperature  40.7C  min  38.5C  max  42.3C
Testing design TEF1001_dma_stream_crc64 using C2H 1 -> H2C 1
     32 len bytes latencies (us):   3.291 (50')   3.303 (75')   3.535 (99')  19.087 (99.999')
     64 len bytes latencies (us):   3.334 (50')   3.344 (75')   3.684 (99')  18.968 (99.999')
    128 len bytes latencies (us):   3.391 (50')   3.412 (75')   3.539 (99')  19.091 (99.999')
    256 len bytes latencies (us):   3.645 (50')   3.656 (75')   3.768 (99')  19.559 (99.999')
    512 len bytes latencies (us):   3.911 (50')   3.921 (75')   4.254 (99')  19.819 (99.999')
   1024 len bytes latencies (us):   4.185 (50')   4.206 (75')   4.553 (99')  19.672 (99.999')
   2048 len bytes latencies (us):   4.757 (50')   4.769 (75')   5.099 (99')  20.588 (99.999')
   4096 len bytes latencies (us):   5.902 (50')   5.917 (75')   6.263 (99')  22.893 (99.999')
   8192 len bytes latencies (us):   8.170 (50')   8.187 (75')   8.534 (99')  23.710 (99.999')
  16384 len bytes latencies (us):  12.773 (50')  12.801 (75')  13.127 (99')  26.457 (99.999')
  32768 len bytes latencies (us):  21.986 (50')  22.011 (75')  22.358 (99')  36.813 (99.999')
  65536 len bytes latencies (us):  40.412 (50')  40.455 (75')  40.790 (99')  56.501 (99.999')
 131072 len bytes latencies (us):  77.307 (50')  77.355 (75')  77.627 (99')  92.538 (99.999')
 262144 len bytes latencies (us): 151.081 (50') 151.142 (75') 151.498 (99') 166.494 (99.999')
 524288 len bytes latencies (us): 298.843 (50') 299.061 (75') 299.426 (99') 317.173 (99.999')
1048576 len bytes latencies (us): 593.539 (50') 593.789 (75') 594.182 (99') 608.156 (99.999')
Current temperature  40.4C  min  38.4C  max  42.6C
Testing design NiteFury_dma_stream_crc64 using C2H 0 -> H2C 0
Current temperature  44.4C  min  42.4C  max  46.3C
     32 len bytes latencies (us):   3.284 (50')   3.302 (75')   3.552 (99')  19.164 (99.999')
     64 len bytes latencies (us):   3.328 (50')   3.338 (75')   3.662 (99')  18.993 (99.999')
    128 len bytes latencies (us):   3.429 (50')   3.439 (75')   3.602 (99')  18.648 (99.999')
    256 len bytes latencies (us):   3.635 (50')   3.645 (75')   3.920 (99')  19.241 (99.999')
    512 len bytes latencies (us):   3.898 (50')   3.909 (75')   4.241 (99')  19.219 (99.999')
   1024 len bytes latencies (us):   4.180 (50')   4.197 (75')   4.542 (99')  20.133 (99.999')
   2048 len bytes latencies (us):   4.764 (50')   4.777 (75')   5.101 (99')  20.918 (99.999')
   4096 len bytes latencies (us):   5.877 (50')   5.889 (75')   6.234 (99')  21.050 (99.999')
   8192 len bytes latencies (us):   8.151 (50')   8.170 (75')   8.450 (99')  23.773 (99.999')
  16384 len bytes latencies (us):  12.773 (50')  12.798 (75')  13.137 (99')  28.778 (99.999')
  32768 len bytes latencies (us):  21.972 (50')  21.992 (75')  22.334 (99')  37.091 (99.999')
  65536 len bytes latencies (us):  40.420 (50')  40.442 (75')  40.772 (99')  56.416 (99.999')
 131072 len bytes latencies (us):  77.282 (50')  77.302 (75')  77.655 (99')  93.012 (99.999')
 262144 len bytes latencies (us): 151.017 (50') 151.292 (75') 151.377 (99') 166.259 (99.999')
 524288 len bytes latencies (us): 298.754 (50') 298.783 (75') 299.135 (99') 314.161 (99.999')
1048576 len bytes latencies (us): 593.073 (50') 593.098 (75') 593.446 (99') 610.305 (99.999')
Current temperature  45.3C  min  42.4C  max  47.1C
Testing design NiteFury_dma_stream_crc64 using C2H 1 -> H2C 1
     32 len bytes latencies (us):   3.272 (50')   3.327 (75')   3.563 (99')  18.647 (99.999')
     64 len bytes latencies (us):   3.314 (50')   3.325 (75')   3.409 (99')  19.586 (99.999')
    128 len bytes latencies (us):   3.410 (50')   3.419 (75')   3.762 (99')  18.690 (99.999')
    256 len bytes latencies (us):   3.611 (50')   3.618 (75')   3.732 (99')  19.219 (99.999')
    512 len bytes latencies (us):   3.881 (50')   3.889 (75')   3.913 (99')  21.692 (99.999')
   1024 len bytes latencies (us):   4.165 (50')   4.175 (75')   4.529 (99')  19.615 (99.999')
   2048 len bytes latencies (us):   4.733 (50')   4.744 (75')   5.016 (99')  20.441 (99.999')
   4096 len bytes latencies (us):   5.857 (50')   5.868 (75')   6.214 (99')  21.592 (99.999')
   8192 len bytes latencies (us):   8.127 (50')   8.141 (75')   8.408 (99')  23.525 (99.999')
  16384 len bytes latencies (us):  12.748 (50')  12.774 (75')  13.126 (99')  28.906 (99.999')
  32768 len bytes latencies (us):  21.957 (50')  21.971 (75')  22.332 (99')  39.794 (99.999')
  65536 len bytes latencies (us):  40.390 (50')  40.403 (75')  40.750 (99')  56.156 (99.999')
 131072 len bytes latencies (us):  77.255 (50')  77.277 (75')  77.497 (99')  97.185 (99.999')
 262144 len bytes latencies (us): 151.020 (50') 151.309 (75') 151.394 (99') 166.724 (99.999')
 524288 len bytes latencies (us): 298.733 (50') 298.758 (75') 299.129 (99') 314.513 (99.999')
1048576 len bytes latencies (us): 593.059 (50') 593.082 (75') 593.431 (99') 608.753 (99.999')
Current temperature  46.1C  min  42.4C  max  47.4C

1.2. TEF1001_dma_stream_crc64, AS02MC04_dma_stream_crc64 and NiteFury_dma_stream_crc64

While AS02MC04_dma_stream_crc64 is set to gen3 x8, is only enumerating at x4 when in slot 5.

[mr_halfword@skylake-alma release]$ identify_pcie_fpga_design/display_identified_pcie_fpga_designs 
Opening device 0000:15:00.0 (10ee:7024) with IOMMU group 41
Enabled bus master for 0000:15:00.0
Opening device 0000:2d:00.0 (10ee:9038) with IOMMU group 25
Enabled bus master for 0000:2d:00.0
Warning: Device device 0000:2d:00.0 (10ee:9038) has reduced bandwidth
         Max width x8 speed 8 GT/s. Negotiated width x4 speed 8 GT/s
Opening device 0000:37:00.0 (10ee:7024) with IOMMU group 86
Enabled bus master for 0000:37:00.0

Design TEF1001_dma_stream_crc64:
  PCI device 0000:15:00.0 rev 00 IOMMU group 41  physical slot 3-1

  DMA bridge bar 2 AXI Stream
  Channel ID  addr_alignment  len_granularity  num_address_bits
       H2C 0               1                1                64
       H2C 1               1                1                64
       C2H 0               1                1                64
       C2H 1               1                1                64
  User access build timestamp : 9B3309E3 - 19/06/2025 16:39:35
  Quad SPI registers at bar 0 offset 0x2000
  XADC registers at bar 0 offset 0x3000
  IIC registers at bar 0 offset 0x0
  bit-banged I2C GPIO registers at bar 0 offset 0x1000

Design AS02MC04_dma_stream_crc64:
  PCI device 0000:2d:00.0 rev 00 IOMMU group 25  physical slot 5-2

  DMA bridge bar 2 AXI Stream
  Channel ID  addr_alignment  len_granularity  num_address_bits
       H2C 0               1                1                64
       H2C 1               1                1                64
       H2C 2               1                1                64
       H2C 3               1                1                64
       C2H 0               1                1                64
       C2H 1               1                1                64
       C2H 2               1                1                64
       C2H 3               1                1                64
  User access build timestamp : ACB30CD7 - 21/09/2025 16:51:23
  Quad SPI registers at bar 0 offset 0x0
  SYSMON registers at bar 0 offset 0x1000

Design NiteFury_dma_stream_crc64:
  PCI device 0000:37:00.0 rev 01 IOMMU group 86  physical slot 6-1

  DMA bridge bar 2 AXI Stream
  Channel ID  addr_alignment  len_granularity  num_address_bits
       H2C 0               1                1                64
       H2C 1               1                1                64
       C2H 0               1                1                64
       C2H 1               1                1                64
  User access build timestamp : 2BB3035E - 05/07/2025 16:13:30
  Quad SPI registers at bar 0 offset 0x0
  XADC registers at bar 0 offset 0x1000
[mr_halfword@skylake-alma release]$ xilinx_dma_bridge_for_pcie/crc64_stream_latency -t
Opening device 0000:15:00.0 (10ee:7024) with IOMMU group 41
Enabled bus master for 0000:15:00.0
Opening device 0000:2d:00.0 (10ee:9038) with IOMMU group 25
Enabled bus master for 0000:2d:00.0
Warning: Device device 0000:2d:00.0 (10ee:9038) has reduced bandwidth
         Max width x8 speed 8 GT/s. Negotiated width x4 speed 8 GT/s
Opening device 0000:37:00.0 (10ee:7024) with IOMMU group 86
Enabled bus master for 0000:37:00.0
Testing design TEF1001_dma_stream_crc64 using C2H 0 -> H2C 0
Current temperature  39.0C  min  37.7C  max  41.0C
     16 len bytes latencies (us):   3.141 (50')   3.270 (75')   3.760 (99')  19.767 (99.999')
     32 len bytes latencies (us):   3.146 (50')   3.154 (75')   3.173 (99')  18.666 (99.999')
     64 len bytes latencies (us):   3.194 (50')   3.202 (75')   3.223 (99')  20.112 (99.999')
    128 len bytes latencies (us):   3.294 (50')   3.303 (75')   3.323 (99')  17.947 (99.999')
    256 len bytes latencies (us):   3.508 (50')   3.517 (75')   3.540 (99')  18.602 (99.999')
    512 len bytes latencies (us):   3.775 (50')   3.785 (75')   3.808 (99')  17.544 (99.999')
   1024 len bytes latencies (us):   4.053 (50')   4.063 (75')   4.088 (99')  18.312 (99.999')
   2048 len bytes latencies (us):   4.618 (50')   4.629 (75')   4.656 (99')  23.337 (99.999')
   4096 len bytes latencies (us):   5.750 (50')   5.761 (75')   5.788 (99')  18.750 (99.999')
   8192 len bytes latencies (us):   8.019 (50')   8.031 (75')   8.060 (99')  23.726 (99.999')
  16384 len bytes latencies (us):  12.629 (50')  12.645 (75')  12.685 (99')  25.379 (99.999')
  32768 len bytes latencies (us):  21.852 (50')  21.876 (75')  21.959 (99')  36.649 (99.999')
  65536 len bytes latencies (us):  40.272 (50')  40.343 (75')  40.706 (99')  59.346 (99.999')
 131072 len bytes latencies (us):  77.159 (50')  77.227 (75')  77.541 (99')  92.550 (99.999')
 262144 len bytes latencies (us): 150.949 (50') 151.027 (75') 151.368 (99') 165.672 (99.999')
 524288 len bytes latencies (us): 298.755 (50') 298.855 (75') 299.173 (99') 312.577 (99.999')
1048576 len bytes latencies (us): 591.703 (50') 593.270 (75') 593.620 (99') 611.019 (99.999')
Current temperature  40.0C  min  37.7C  max  41.6C
Testing design TEF1001_dma_stream_crc64 using C2H 1 -> H2C 1
     16 len bytes latencies (us):   3.125 (50')   3.133 (75')   3.156 (99')  13.441 (99.999')
     32 len bytes latencies (us):   3.149 (50')   3.157 (75')   3.177 (99')  16.308 (99.999')
     64 len bytes latencies (us):   3.194 (50')   3.203 (75')   3.224 (99')  17.745 (99.999')
    128 len bytes latencies (us):   3.292 (50')   3.302 (75')   3.323 (99')  17.468 (99.999')
    256 len bytes latencies (us):   3.502 (50')   3.511 (75')   3.533 (99')  16.430 (99.999')
    512 len bytes latencies (us):   3.796 (50')   3.808 (75')   3.834 (99')  18.489 (99.999')
   1024 len bytes latencies (us):   4.054 (50')   4.064 (75')   4.090 (99')  13.226 (99.999')
   2048 len bytes latencies (us):   4.619 (50')   4.630 (75')   4.661 (99')  20.146 (99.999')
   4096 len bytes latencies (us):   5.750 (50')   5.761 (75')   5.787 (99')  17.972 (99.999')
   8192 len bytes latencies (us):   8.019 (50')   8.031 (75')   8.062 (99')  20.908 (99.999')
  16384 len bytes latencies (us):  12.651 (50')  12.678 (75')  12.894 (99')  31.913 (99.999')
  32768 len bytes latencies (us):  21.854 (50')  21.878 (75')  21.925 (99')  35.265 (99.999')
  65536 len bytes latencies (us):  40.275 (50')  40.296 (75')  40.341 (99')  52.191 (99.999')
 131072 len bytes latencies (us):  77.185 (50')  77.278 (75')  77.549 (99')  93.102 (99.999')
 262144 len bytes latencies (us): 150.997 (50') 151.036 (75') 151.374 (99') 166.585 (99.999')
 524288 len bytes latencies (us): 298.039 (50') 298.864 (75') 299.199 (99') 326.074 (99.999')
1048576 len bytes latencies (us): 593.198 (50') 593.317 (75') 593.658 (99') 666.260 (99.999')
Current temperature  39.7C  min  37.7C  max  42.1C
Testing design AS02MC04_dma_stream_crc64 using C2H 0 -> H2C 0
Current temperature  45.3C  min  40.8C  max  47.7C
     32 len bytes latencies (us):   2.597 (50')   2.605 (75')   2.889 (99')  17.658 (99.999')
     64 len bytes latencies (us):   2.567 (50')   2.574 (75')   2.594 (99')  16.976 (99.999')
    128 len bytes latencies (us):   2.604 (50')   2.612 (75')   2.637 (99')  20.491 (99.999')
    256 len bytes latencies (us):   2.677 (50')   2.685 (75')   2.708 (99')  17.268 (99.999')
    512 len bytes latencies (us):   2.790 (50')   2.800 (75')   2.827 (99')   6.585 (99.999')
   1024 len bytes latencies (us):   2.934 (50')   2.943 (75')   2.976 (99')  13.501 (99.999')
   2048 len bytes latencies (us):   3.216 (50')   3.226 (75')   3.258 (99')  17.995 (99.999')
   4096 len bytes latencies (us):   3.789 (50')   3.799 (75')   3.833 (99')  13.802 (99.999')
   8192 len bytes latencies (us):   4.940 (50')   4.954 (75')   4.987 (99')  19.279 (99.999')
  16384 len bytes latencies (us):   7.268 (50')   7.279 (75')   7.310 (99')  19.513 (99.999')
  32768 len bytes latencies (us):  11.978 (50')  11.995 (75')  12.320 (99')  27.073 (99.999')
  65536 len bytes latencies (us):  21.424 (50')  21.440 (75')  21.503 (99')  34.371 (99.999')
 131072 len bytes latencies (us):  40.300 (50')  40.322 (75')  40.587 (99')  56.319 (99.999')
 262144 len bytes latencies (us):  78.150 (50')  78.169 (75')  78.222 (99')  95.150 (99.999')
 524288 len bytes latencies (us): 153.996 (50') 154.136 (75') 154.380 (99') 169.679 (99.999')
1048576 len bytes latencies (us): 304.875 (50') 304.892 (75') 304.948 (99') 320.678 (99.999')
Current temperature  46.2C  min  40.8C  max  49.2C
Testing design AS02MC04_dma_stream_crc64 using C2H 1 -> H2C 1
     32 len bytes latencies (us):   2.557 (50')   2.571 (75')   2.605 (99')  12.862 (99.999')
     64 len bytes latencies (us):   2.568 (50')   2.575 (75')   2.596 (99')  17.079 (99.999')
    128 len bytes latencies (us):   2.610 (50')   2.618 (75')   2.638 (99')  11.323 (99.999')
    256 len bytes latencies (us):   2.681 (50')   2.690 (75')   2.711 (99')   4.510 (99.999')
    512 len bytes latencies (us):   2.788 (50')   2.797 (75')   2.822 (99')  18.476 (99.999')
   1024 len bytes latencies (us):   2.927 (50')   2.935 (75')   2.966 (99')   4.730 (99.999')
   2048 len bytes latencies (us):   3.218 (50')   3.229 (75')   3.262 (99')  11.347 (99.999')
   4096 len bytes latencies (us):   3.791 (50')   3.802 (75')   3.837 (99')  16.591 (99.999')
   8192 len bytes latencies (us):   4.942 (50')   4.956 (75')   4.988 (99')  17.654 (99.999')
  16384 len bytes latencies (us):   7.268 (50')   7.280 (75')   7.316 (99')  21.715 (99.999')
  32768 len bytes latencies (us):  11.987 (50')  12.005 (75')  12.065 (99')  26.567 (99.999')
  65536 len bytes latencies (us):  21.539 (50')  21.569 (75')  21.900 (99')  32.232 (99.999')
 131072 len bytes latencies (us):  40.292 (50')  40.311 (75')  40.691 (99')  59.507 (99.999')
 262144 len bytes latencies (us):  78.150 (50')  78.168 (75')  78.217 (99')  92.502 (99.999')
 524288 len bytes latencies (us): 154.016 (50') 154.170 (75') 154.379 (99') 169.440 (99.999')
1048576 len bytes latencies (us): 304.873 (50') 304.894 (75') 304.959 (99') 320.776 (99.999')
Current temperature  46.2C  min  40.8C  max  49.2C
Testing design AS02MC04_dma_stream_crc64 using C2H 2 -> H2C 2
     32 len bytes latencies (us):   2.552 (50')   2.559 (75')   2.585 (99')  16.485 (99.999')
     64 len bytes latencies (us):   2.567 (50')   2.574 (75')   2.596 (99')  17.571 (99.999')
    128 len bytes latencies (us):   2.606 (50')   2.668 (75')   2.966 (99')  15.823 (99.999')
    256 len bytes latencies (us):   2.712 (50')   2.723 (75')   2.928 (99')  17.477 (99.999')
    512 len bytes latencies (us):   2.827 (50')   2.836 (75')   3.143 (99')  16.870 (99.999')
   1024 len bytes latencies (us):   2.926 (50')   2.935 (75')   2.965 (99')  16.279 (99.999')
   2048 len bytes latencies (us):   3.238 (50')   3.249 (75')   3.283 (99')  18.941 (99.999')
   4096 len bytes latencies (us):   3.789 (50')   3.798 (75')   3.832 (99')  11.412 (99.999')
   8192 len bytes latencies (us):   4.940 (50')   4.954 (75')   4.987 (99')  14.684 (99.999')
  16384 len bytes latencies (us):   7.268 (50')   7.280 (75')   7.311 (99')  12.778 (99.999')
  32768 len bytes latencies (us):  11.983 (50')  12.001 (75')  12.047 (99')  27.790 (99.999')
  65536 len bytes latencies (us):  21.489 (50')  21.552 (75')  21.754 (99')  32.064 (99.999')
 131072 len bytes latencies (us):  40.290 (50')  40.308 (75')  40.717 (99')  55.386 (99.999')
 262144 len bytes latencies (us):  78.150 (50')  78.170 (75')  78.225 (99')  88.970 (99.999')
 524288 len bytes latencies (us): 153.944 (50') 153.995 (75') 154.367 (99') 169.892 (99.999')
1048576 len bytes latencies (us): 304.876 (50') 304.894 (75') 304.950 (99') 320.527 (99.999')
Current temperature  46.2C  min  40.8C  max  49.7C
Testing design AS02MC04_dma_stream_crc64 using C2H 3 -> H2C 3
     32 len bytes latencies (us):   2.551 (50')   2.558 (75')   2.582 (99')   4.281 (99.999')
     64 len bytes latencies (us):   2.608 (50')   2.624 (75')   2.895 (99')  12.422 (99.999')
    128 len bytes latencies (us):   2.604 (50')   2.612 (75')   2.639 (99')  11.966 (99.999')
    256 len bytes latencies (us):   2.677 (50')   2.685 (75')   2.708 (99')   8.249 (99.999')
    512 len bytes latencies (us):   2.897 (50')   2.907 (75')   2.970 (99')  17.595 (99.999')
   1024 len bytes latencies (us):   2.927 (50')   2.936 (75')   3.074 (99')   5.112 (99.999')
   2048 len bytes latencies (us):   3.215 (50')   3.226 (75')   3.258 (99')  18.511 (99.999')
   4096 len bytes latencies (us):   3.789 (50')   3.799 (75')   3.833 (99')  13.303 (99.999')
   8192 len bytes latencies (us):   4.941 (50')   4.955 (75')   4.999 (99')  19.848 (99.999')
  16384 len bytes latencies (us):   7.268 (50')   7.280 (75')   7.311 (99')  20.013 (99.999')
  32768 len bytes latencies (us):  12.012 (50')  12.043 (75')  12.374 (99')  26.369 (99.999')
  65536 len bytes latencies (us):  21.444 (50')  21.468 (75')  21.763 (99')  34.495 (99.999')
 131072 len bytes latencies (us):  40.307 (50')  40.327 (75')  40.740 (99')  53.932 (99.999')
 262144 len bytes latencies (us):  78.150 (50')  78.171 (75')  78.231 (99')  93.851 (99.999')
 524288 len bytes latencies (us): 153.862 (50') 153.881 (75') 154.251 (99') 169.577 (99.999')
1048576 len bytes latencies (us): 304.875 (50') 304.897 (75') 304.963 (99') 320.679 (99.999')
Current temperature  47.2C  min  40.8C  max  49.7C
Testing design NiteFury_dma_stream_crc64 using C2H 0 -> H2C 0
Current temperature  42.6C  min  41.0C  max  44.7C
     16 len bytes latencies (us):   3.129 (50')   3.137 (75')   3.166 (99')   4.692 (99.999')
     32 len bytes latencies (us):   3.151 (50')   3.157 (75')   3.174 (99')  17.870 (99.999')
     64 len bytes latencies (us):   3.202 (50')   3.209 (75')   3.231 (99')  18.940 (99.999')
    128 len bytes latencies (us):   3.300 (50')   3.307 (75')   3.328 (99')  16.147 (99.999')
    256 len bytes latencies (us):   3.508 (50')   3.514 (75')   3.530 (99')  18.350 (99.999')
    512 len bytes latencies (us):   3.774 (50')   3.781 (75')   3.802 (99')  17.822 (99.999')
   1024 len bytes latencies (us):   4.054 (50')   4.061 (75')   4.084 (99')  18.680 (99.999')
   2048 len bytes latencies (us):   4.617 (50')   4.627 (75')   4.648 (99')  17.219 (99.999')
   4096 len bytes latencies (us):   5.749 (50')   5.757 (75')   5.778 (99')  20.948 (99.999')
   8192 len bytes latencies (us):   8.022 (50')   8.032 (75')   8.057 (99')  26.992 (99.999')
  16384 len bytes latencies (us):  12.633 (50')  12.646 (75')  12.676 (99')  26.702 (99.999')
  32768 len bytes latencies (us):  21.972 (50')  21.995 (75')  22.262 (99')  37.361 (99.999')
  65536 len bytes latencies (us):  40.317 (50')  40.349 (75')  40.600 (99')  52.288 (99.999')
 131072 len bytes latencies (us):  77.239 (50')  77.260 (75')  77.527 (99')  92.158 (99.999')
 262144 len bytes latencies (us): 151.120 (50') 151.157 (75') 151.413 (99') 165.870 (99.999')
 524288 len bytes latencies (us): 299.016 (50') 299.075 (75') 299.409 (99') 316.381 (99.999')
1048576 len bytes latencies (us): 593.468 (50') 593.512 (75') 593.833 (99') 607.714 (99.999')
Current temperature  44.2C  min  41.0C  max  45.4C
Testing design NiteFury_dma_stream_crc64 using C2H 1 -> H2C 1
     16 len bytes latencies (us):   3.126 (50')   3.131 (75')   3.145 (99')  15.992 (99.999')
     32 len bytes latencies (us):   3.150 (50')   3.157 (75')   3.171 (99')  17.866 (99.999')
     64 len bytes latencies (us):   3.200 (50')   3.207 (75')   3.221 (99')  18.149 (99.999')
    128 len bytes latencies (us):   3.301 (50')   3.308 (75')   3.323 (99')  17.308 (99.999')
    256 len bytes latencies (us):   3.505 (50')   3.511 (75')   3.526 (99')  18.931 (99.999')
    512 len bytes latencies (us):   3.794 (50')   3.883 (75')   3.927 (99')  19.502 (99.999')
   1024 len bytes latencies (us):   4.068 (50')   4.075 (75')   4.093 (99')  19.294 (99.999')
   2048 len bytes latencies (us):   4.629 (50')   4.637 (75')   4.661 (99')  20.255 (99.999')
   4096 len bytes latencies (us):   5.760 (50')   5.768 (75')   5.787 (99')  15.070 (99.999')
   8192 len bytes latencies (us):   8.030 (50')   8.040 (75')   8.070 (99')  20.547 (99.999')
  16384 len bytes latencies (us):  12.638 (50')  12.650 (75')  12.674 (99')  28.135 (99.999')
  32768 len bytes latencies (us):  21.875 (50')  21.887 (75')  21.917 (99')  32.044 (99.999')
  65536 len bytes latencies (us):  40.310 (50')  40.322 (75')  40.349 (99')  56.807 (99.999')
 131072 len bytes latencies (us):  77.167 (50')  77.183 (75')  77.310 (99')  91.936 (99.999')
 262144 len bytes latencies (us): 150.877 (50') 150.891 (75') 151.167 (99') 167.394 (99.999')
 524288 len bytes latencies (us): 298.754 (50') 298.780 (75') 299.116 (99') 313.384 (99.999')
1048576 len bytes latencies (us): 593.084 (50') 593.120 (75') 593.476 (99') 609.497 (99.999')
Current temperature  44.7C  min  41.0C  max  45.9C
[mr_halfword@skylake-alma release]$ xilinx_dma_bridge_for_pcie/test_dma_bridge_independent_streams 
Opening device 0000:15:00.0 (10ee:7024) with IOMMU group 41
Enabled bus master for 0000:15:00.0
Opening device 0000:2d:00.0 (10ee:9038) with IOMMU group 25
Enabled bus master for 0000:2d:00.0
Warning: Device device 0000:2d:00.0 (10ee:9038) has reduced bandwidth
         Max width x8 speed 8 GT/s. Negotiated width x4 speed 8 GT/s
Opening device 0000:37:00.0 (10ee:7024) with IOMMU group 86
Enabled bus master for 0000:37:00.0
Using num_descriptors=64 bytes_per_buffer=0x1000000 data_mapping_size_words=0x10000000
Selecting test of TEF1001_dma_stream_crc64 design PCI device 0000:15:00.0 IOMMU group 41 H2C channel 0
Selecting test of TEF1001_dma_stream_crc64 design PCI device 0000:15:00.0 IOMMU group 41 H2C channel 1
Selecting test of TEF1001_dma_stream_crc64 design PCI device 0000:15:00.0 IOMMU group 41 C2H channel 0
Selecting test of TEF1001_dma_stream_crc64 design PCI device 0000:15:00.0 IOMMU group 41 C2H channel 1
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:2d:00.0 IOMMU group 25 H2C channel 0
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:2d:00.0 IOMMU group 25 H2C channel 1
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:2d:00.0 IOMMU group 25 H2C channel 2
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:2d:00.0 IOMMU group 25 H2C channel 3
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:2d:00.0 IOMMU group 25 C2H channel 0
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:2d:00.0 IOMMU group 25 C2H channel 1
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:2d:00.0 IOMMU group 25 C2H channel 2
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:2d:00.0 IOMMU group 25 C2H channel 3
Selecting test of NiteFury_dma_stream_crc64 design PCI device 0000:37:00.0 IOMMU group 86 H2C channel 0
Selecting test of NiteFury_dma_stream_crc64 design PCI device 0000:37:00.0 IOMMU group 86 H2C channel 1
Selecting test of NiteFury_dma_stream_crc64 design PCI device 0000:37:00.0 IOMMU group 86 C2H channel 0
Selecting test of NiteFury_dma_stream_crc64 design PCI device 0000:37:00.0 IOMMU group 86 C2H channel 1
Press Ctrl-C to stop test
  0000:15:00.0 H2C channel 0 888.997 Mbytes/sec (8875147264 bytes in 529 transfers over 9.983324 secs)
  0000:15:00.0 H2C channel 1 888.997 Mbytes/sec (8875147264 bytes in 529 transfers over 9.983325 secs)
  0000:2d:00.0 H2C channel 0 872.108 Mbytes/sec (8707375104 bytes in 519 transfers over 9.984282 secs)
  0000:2d:00.0 H2C channel 1 872.108 Mbytes/sec (8707375104 bytes in 519 transfers over 9.984282 secs)
  0000:2d:00.0 H2C channel 2 872.109 Mbytes/sec (8707375104 bytes in 519 transfers over 9.984274 secs)
  0000:2d:00.0 H2C channel 3 872.109 Mbytes/sec (8707375104 bytes in 519 transfers over 9.984275 secs)
  0000:37:00.0 H2C channel 0 894.488 Mbytes/sec (8942256128 bytes in 533 transfers over 9.997064 secs)
  0000:37:00.0 H2C channel 1 894.488 Mbytes/sec (8942256128 bytes in 533 transfers over 9.997065 secs)
  0000:15:00.0 C2H channel 0 0.000 Mbytes/sec (4232 bytes in 529 transfers over 9.983305 secs)
  0000:15:00.0 C2H channel 1 0.000 Mbytes/sec (4232 bytes in 529 transfers over 9.983305 secs)
  0000:2d:00.0 C2H channel 0 0.000 Mbytes/sec (4152 bytes in 519 transfers over 9.984264 secs)
  0000:2d:00.0 C2H channel 1 0.000 Mbytes/sec (4152 bytes in 519 transfers over 9.984264 secs)
  0000:2d:00.0 C2H channel 2 0.000 Mbytes/sec (4152 bytes in 519 transfers over 9.984271 secs)
  0000:2d:00.0 C2H channel 3 0.000 Mbytes/sec (4152 bytes in 519 transfers over 9.984272 secs)
  0000:37:00.0 C2H channel 0 0.000 Mbytes/sec (4264 bytes in 533 transfers over 9.997061 secs)
  0000:37:00.0 C2H channel 1 0.000 Mbytes/sec (4264 bytes in 533 transfers over 9.997062 secs)

  0000:15:00.0 H2C channel 0 888.999 Mbytes/sec (8891924480 bytes in 530 transfers over 10.002174 secs)
  0000:15:00.0 H2C channel 1 888.999 Mbytes/sec (8891924480 bytes in 530 transfers over 10.002173 secs)
  0000:2d:00.0 H2C channel 0 872.105 Mbytes/sec (8724152320 bytes in 520 transfers over 10.003553 secs)
  0000:2d:00.0 H2C channel 1 872.105 Mbytes/sec (8724152320 bytes in 520 transfers over 10.003553 secs)
  0000:2d:00.0 H2C channel 2 872.105 Mbytes/sec (8724152320 bytes in 520 transfers over 10.003552 secs)
  0000:2d:00.0 H2C channel 3 872.105 Mbytes/sec (8724152320 bytes in 520 transfers over 10.003552 secs)
  0000:37:00.0 H2C channel 0 894.496 Mbytes/sec (8942256128 bytes in 533 transfers over 9.996982 secs)
  0000:37:00.0 H2C channel 1 894.496 Mbytes/sec (8942256128 bytes in 533 transfers over 9.996982 secs)
  0000:15:00.0 C2H channel 0 0.000 Mbytes/sec (4240 bytes in 530 transfers over 10.002174 secs)
  0000:15:00.0 C2H channel 1 0.000 Mbytes/sec (4240 bytes in 530 transfers over 10.002174 secs)
  0000:2d:00.0 C2H channel 0 0.000 Mbytes/sec (4160 bytes in 520 transfers over 10.003553 secs)
  0000:2d:00.0 C2H channel 1 0.000 Mbytes/sec (4160 bytes in 520 transfers over 10.003553 secs)
  0000:2d:00.0 C2H channel 2 0.000 Mbytes/sec (4160 bytes in 520 transfers over 10.003553 secs)
  0000:2d:00.0 C2H channel 3 0.000 Mbytes/sec (4160 bytes in 520 transfers over 10.003552 secs)
  0000:37:00.0 C2H channel 0 0.000 Mbytes/sec (4264 bytes in 533 transfers over 9.996982 secs)
  0000:37:00.0 C2H channel 1 0.000 Mbytes/sec (4264 bytes in 533 transfers over 9.996982 secs)

  0000:15:00.0 H2C channel 0 889.000 Mbytes/sec (8891924480 bytes in 530 transfers over 10.002161 secs)
  0000:15:00.0 H2C channel 1 889.000 Mbytes/sec (8891924480 bytes in 530 transfers over 10.002161 secs)
  0000:2d:00.0 H2C channel 0 872.106 Mbytes/sec (8724152320 bytes in 520 transfers over 10.003545 secs)
  0000:2d:00.0 H2C channel 1 872.106 Mbytes/sec (8724152320 bytes in 520 transfers over 10.003545 secs)
  0000:2d:00.0 H2C channel 2 872.106 Mbytes/sec (8724152320 bytes in 520 transfers over 10.003545 secs)
  0000:2d:00.0 H2C channel 3 872.106 Mbytes/sec (8724152320 bytes in 520 transfers over 10.003545 secs)
  0000:37:00.0 H2C channel 0 894.496 Mbytes/sec (8942256128 bytes in 533 transfers over 9.996982 secs)
  0000:37:00.0 H2C channel 1 894.496 Mbytes/sec (8942256128 bytes in 533 transfers over 9.996982 secs)
  0000:15:00.0 C2H channel 0 0.000 Mbytes/sec (4240 bytes in 530 transfers over 10.002160 secs)
  0000:15:00.0 C2H channel 1 0.000 Mbytes/sec (4240 bytes in 530 transfers over 10.002160 secs)
  0000:2d:00.0 C2H channel 0 0.000 Mbytes/sec (4160 bytes in 520 transfers over 10.003545 secs)
  0000:2d:00.0 C2H channel 1 0.000 Mbytes/sec (4160 bytes in 520 transfers over 10.003545 secs)
  0000:2d:00.0 C2H channel 2 0.000 Mbytes/sec (4160 bytes in 520 transfers over 10.003544 secs)
  0000:2d:00.0 C2H channel 3 0.000 Mbytes/sec (4160 bytes in 520 transfers over 10.003545 secs)
  0000:37:00.0 C2H channel 0 0.000 Mbytes/sec (4264 bytes in 533 transfers over 9.996983 secs)
  0000:37:00.0 C2H channel 1 0.000 Mbytes/sec (4264 bytes in 533 transfers over 9.996982 secs)

^C  0000:15:00.0 H2C channel 0 889.002 Mbytes/sec (6744440832 bytes in 402 transfers over 7.586532 secs)
  0000:15:00.0 H2C channel 1 889.002 Mbytes/sec (6744440832 bytes in 402 transfers over 7.586532 secs)
  0000:2d:00.0 H2C channel 0 872.092 Mbytes/sec (6643777536 bytes in 396 transfers over 7.618211 secs)
  0000:2d:00.0 H2C channel 1 872.092 Mbytes/sec (6643777536 bytes in 396 transfers over 7.618211 secs)
  0000:2d:00.0 H2C channel 2 872.092 Mbytes/sec (6643777536 bytes in 396 transfers over 7.618206 secs)
  0000:2d:00.0 H2C channel 3 872.092 Mbytes/sec (6643777536 bytes in 396 transfers over 7.618205 secs)
  0000:37:00.0 H2C channel 0 894.494 Mbytes/sec (6777995264 bytes in 404 transfers over 7.577462 secs)
  0000:37:00.0 H2C channel 1 894.494 Mbytes/sec (6777995264 bytes in 404 transfers over 7.577462 secs)
  0000:15:00.0 C2H channel 0 0.000 Mbytes/sec (3216 bytes in 402 transfers over 7.586532 secs)
  0000:15:00.0 C2H channel 1 0.000 Mbytes/sec (3216 bytes in 402 transfers over 7.586532 secs)
  0000:2d:00.0 C2H channel 0 0.000 Mbytes/sec (3168 bytes in 396 transfers over 7.618211 secs)
  0000:2d:00.0 C2H channel 1 0.000 Mbytes/sec (3168 bytes in 396 transfers over 7.618211 secs)
  0000:2d:00.0 C2H channel 2 0.000 Mbytes/sec (3168 bytes in 396 transfers over 7.618207 secs)
  0000:2d:00.0 C2H channel 3 0.000 Mbytes/sec (3168 bytes in 396 transfers over 7.618206 secs)
  0000:37:00.0 C2H channel 0 0.000 Mbytes/sec (3232 bytes in 404 transfers over 7.577462 secs)
  0000:37:00.0 C2H channel 1 0.000 Mbytes/sec (3232 bytes in 404 transfers over 7.577461 secs)

Overall test statistics:
  0000:15:00.0 H2C channel 0 889.000 Mbytes/sec (33403437056 bytes in 1991 transfers over 37.574190 secs)
  0000:15:00.0 H2C channel 1 889.000 Mbytes/sec (33403437056 bytes in 1991 transfers over 37.574190 secs)
  0000:2d:00.0 H2C channel 0 872.104 Mbytes/sec (32799457280 bytes in 1955 transfers over 37.609590 secs)
  0000:2d:00.0 H2C channel 1 872.104 Mbytes/sec (32799457280 bytes in 1955 transfers over 37.609590 secs)
  0000:2d:00.0 H2C channel 2 872.104 Mbytes/sec (32799457280 bytes in 1955 transfers over 37.609578 secs)
  0000:2d:00.0 H2C channel 3 872.104 Mbytes/sec (32799457280 bytes in 1955 transfers over 37.609578 secs)
  0000:37:00.0 H2C channel 0 894.493 Mbytes/sec (33604763648 bytes in 2003 transfers over 37.568491 secs)
  0000:37:00.0 H2C channel 1 894.493 Mbytes/sec (33604763648 bytes in 2003 transfers over 37.568491 secs)
  0000:15:00.0 C2H channel 0 0.000 Mbytes/sec (15928 bytes in 1991 transfers over 37.574171 secs)
  0000:15:00.0 C2H channel 1 0.000 Mbytes/sec (15928 bytes in 1991 transfers over 37.574171 secs)
  0000:2d:00.0 C2H channel 0 0.000 Mbytes/sec (15640 bytes in 1955 transfers over 37.609572 secs)
  0000:2d:00.0 C2H channel 1 0.000 Mbytes/sec (15640 bytes in 1955 transfers over 37.609571 secs)
  0000:2d:00.0 C2H channel 2 0.000 Mbytes/sec (15640 bytes in 1955 transfers over 37.609575 secs)
  0000:2d:00.0 C2H channel 3 0.000 Mbytes/sec (15640 bytes in 1955 transfers over 37.609575 secs)
  0000:37:00.0 C2H channel 0 0.000 Mbytes/sec (16024 bytes in 2003 transfers over 37.568488 secs)
  0000:37:00.0 C2H channel 1 0.000 Mbytes/sec (16024 bytes in 2003 transfers over 37.568488 secs)


Overall PASS

2. Intel DH67BL motherboard

An Intel DH67BL motherboard with a i5-2310 CPU. Running AlmaLinux 8.10.

2.1. TOSING_160T_dma_stream_loopback

Using a Kintex 7 160T FPGA with a PCIe x4 5 GT/s interface.

2.1.1. x4 2.5 GT/s

Run without the VFIO manager. With this design in the PC the re-enumeration which happens as a result of the PCIe Hot Reset after a VFIO open results in the test running at reduced link speed of 2.5 GT/s:

[mr_halfword@haswell-alma release]$ xilinx_dma_bridge_for_pcie/test_dma_bridge_independent_streams --stream_mapping_size 020000000 
Opening device 0000:01:00.0 (10ee:7024) with IOMMU group 0
Enabled bus master for 0000:01:00.0
Warning: Device device 0000:01:00.0 (10ee:7024) has reduced bandwidth
         Max width x4 speed 5 GT/s. Negotiated width x4 speed 2.5 GT/s
Using num_descriptors=64 bytes_per_buffer=0x10000 data_mapping_size_words=0x100000
Device 0000:01:00.0 design TOSING_160T_dma_stream_loopback routes updated
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 H2C channel 0
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 H2C channel 1
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 C2H channel 0
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 C2H channel 1
Opened DMA MEM device : /dev/cmem (dev_desc = 0x00000007)
Debug: mmap param length 0x2000, Addr: 0x100000000 
Buff num 0: Phys addr : 0x100000000 User Addr: 0x7f94a7d8c000 
Debug: mmap param length 0x400000, Addr: 0x100002000 
Buff num 0: Phys addr : 0x100002000 User Addr: 0x7f94a6f67000 
Debug: mmap param length 0x2000, Addr: 0x100402000 
Buff num 0: Phys addr : 0x100402000 User Addr: 0x7f94a7d8a000 
Debug: mmap param length 0x400000, Addr: 0x100404000 
Buff num 0: Phys addr : 0x100404000 User Addr: 0x7f94a6b67000 
Debug: mmap param length 0x2000, Addr: 0x100804000 
Buff num 0: Phys addr : 0x100804000 User Addr: 0x7f94a7d88000 
Debug: mmap param length 0x400000, Addr: 0x100806000 
Buff num 0: Phys addr : 0x100806000 User Addr: 0x7f94a6767000 
Debug: mmap param length 0x2000, Addr: 0x100c06000 
Buff num 0: Phys addr : 0x100c06000 User Addr: 0x7f94a7d86000 
Debug: mmap param length 0x400000, Addr: 0x100c08000 
Buff num 0: Phys addr : 0x100c08000 User Addr: 0x7f94a6367000 
Press Ctrl-C to stop test
  0000:01:00.0 H2C channel 0 381.378 Mbytes/sec (3813736448 bytes in 9.999882 secs)
  0000:01:00.0 H2C channel 1 381.378 Mbytes/sec (3813736448 bytes in 9.999889 secs)
  0000:01:00.0 C2H channel 0 381.378 Mbytes/sec (3813736448 bytes in 9.999888 secs)
  0000:01:00.0 C2H channel 1 381.378 Mbytes/sec (3813736448 bytes in 9.999880 secs)

  0000:01:00.0 H2C channel 0 381.269 Mbytes/sec (3812687872 bytes in 9.999984 secs)
  0000:01:00.0 H2C channel 1 381.269 Mbytes/sec (3812687872 bytes in 9.999984 secs)
  0000:01:00.0 C2H channel 0 381.269 Mbytes/sec (3812687872 bytes in 9.999984 secs)
  0000:01:00.0 C2H channel 1 381.269 Mbytes/sec (3812687872 bytes in 9.999984 secs)

  0000:01:00.0 H2C channel 0 381.399 Mbytes/sec (3813998592 bytes in 10.000013 secs)
  0000:01:00.0 H2C channel 1 381.399 Mbytes/sec (3813998592 bytes in 10.000013 secs)
  0000:01:00.0 C2H channel 0 381.399 Mbytes/sec (3813998592 bytes in 10.000013 secs)
  0000:01:00.0 C2H channel 1 381.399 Mbytes/sec (3813998592 bytes in 10.000013 secs)

^C  0000:01:00.0 H2C channel 0 381.292 Mbytes/sec (3575513088 bytes in 9.377370 secs)
  0000:01:00.0 H2C channel 1 381.292 Mbytes/sec (3575513088 bytes in 9.377366 secs)
  0000:01:00.0 C2H channel 0 381.292 Mbytes/sec (3575513088 bytes in 9.377366 secs)
  0000:01:00.0 C2H channel 1 381.292 Mbytes/sec (3575513088 bytes in 9.377370 secs)

Overall test statistics:
  0000:01:00.0 H2C channel 0 381.335 Mbytes/sec (15015936000 bytes in 39.377249 secs)
  0000:01:00.0 H2C channel 1 381.335 Mbytes/sec (15015936000 bytes in 39.377252 secs)
  0000:01:00.0 C2H channel 0 381.335 Mbytes/sec (15015936000 bytes in 39.377251 secs)
  0000:01:00.0 C2H channel 1 381.335 Mbytes/sec (15015936000 bytes in 39.377247 secs)

Memory Driver closed 

Overall PASS

2.1.2. x4 5 GT/s

Start the VFIO manager, which is unable to check the device speed as only initially opens the IOMMU group:

[mr_halfword@haswell-alma release]$ vfio_access/vfio_multi_process_manager 
Opening device 0000:01:00.0 (10ee:7024) with IOMMU group 0
Enabled bus master for 0000:01:00.0

Set the default AXI switch routing, and leave the device open. This trigges the VFIO manager to actually open the device and since has opened the device reports a reduced link speed:

[mr_halfword@haswell-alma release]$ xilinx_axi_stream_switch/xilinx_axi_stream_switch_set_routing --default_routing --pause_before_vfio_close
Warning: Device device 0000:01:00.0 (10ee:7024) has reduced bandwidth
         Max width x4 speed 5 GT/s. Negotiated width x4 speed 2.5 GT/s
Device 0000:01:00.0 design TOSING_160T_dma_stream_loopback routes updated
Routes processed in 1 devices. Press return to close the VFIO devices.

Run a program to trigger a re-train for a 5 GT/s link speed, while the VFIO device remains open:

[mr_halfword@haswell-alma release]$ sudo dump_info/pcie_set_speed_libpciaccess 0000:01:00.0 2
[sudo] password for mr_halfword: 
Operating on device 0000:00:01.0 vendor_id=8086 (Intel Corporation) device_id=0101 (Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port) revision_id=09
Link capabilities: 02212D02 Max link speed 5 GT/s max link width x16
Link status: 5041
Current link speed: 2.5 GT/s
Original link control 2: 0002
Original target link speed: 2 (5 GT/s)
New target link speed: 2 (5 GT/s)
New link control 2: 0002
Triggering link retraining by changing link control 0040 -> 0060
Link status: 5042
Current link speed: 5 GT/s

RUn the DMA test again. This time no warning about a link reduced speed, and get .9 times the test throughput than at the lower link speed:

[mr_halfword@haswell-alma release]$ xilinx_dma_bridge_for_pcie/test_dma_bridge_independent_streams --stream_mapping_size 020000000 
Using num_descriptors=64 bytes_per_buffer=0x10000 data_mapping_size_words=0x100000
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 H2C channel 0
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 H2C channel 1
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 C2H channel 0
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 0 C2H channel 1
Opened DMA MEM device : /dev/cmem (dev_desc = 0x00000007)
Debug: mmap param length 0x2000, Addr: 0x100000000 
Buff num 0: Phys addr : 0x100000000 User Addr: 0x7fc1cd923000 
Debug: mmap param length 0x400000, Addr: 0x100002000 
Buff num 0: Phys addr : 0x100002000 User Addr: 0x7fc1ccafe000 
Debug: mmap param length 0x2000, Addr: 0x100402000 
Buff num 0: Phys addr : 0x100402000 User Addr: 0x7fc1cd921000 
Debug: mmap param length 0x400000, Addr: 0x100404000 
Buff num 0: Phys addr : 0x100404000 User Addr: 0x7fc1cc6fe000 
Debug: mmap param length 0x2000, Addr: 0x100804000 
Buff num 0: Phys addr : 0x100804000 User Addr: 0x7fc1cd91f000 
Debug: mmap param length 0x400000, Addr: 0x100806000 
Buff num 0: Phys addr : 0x100806000 User Addr: 0x7fc1cc2fe000 
Debug: mmap param length 0x2000, Addr: 0x100c06000 
Buff num 0: Phys addr : 0x100c06000 User Addr: 0x7fc1cd91d000 
Debug: mmap param length 0x400000, Addr: 0x100c08000 
Buff num 0: Phys addr : 0x100c08000 User Addr: 0x7fc1cbefe000 
Press Ctrl-C to stop test
  0000:01:00.0 H2C channel 0 739.325 Mbytes/sec (7393247232 bytes in 9.999999 secs)
  0000:01:00.0 H2C channel 1 739.324 Mbytes/sec (7393181696 bytes in 9.999916 secs)
  0000:01:00.0 C2H channel 0 739.324 Mbytes/sec (7393181696 bytes in 9.999926 secs)
  0000:01:00.0 C2H channel 1 739.324 Mbytes/sec (7393181696 bytes in 9.999920 secs)

  0000:01:00.0 H2C channel 0 739.574 Mbytes/sec (7395737600 bytes in 9.999992 secs)
  0000:01:00.0 H2C channel 1 739.574 Mbytes/sec (7395803136 bytes in 10.000080 secs)
  0000:01:00.0 C2H channel 0 739.574 Mbytes/sec (7395737600 bytes in 9.999992 secs)
  0000:01:00.0 C2H channel 1 739.574 Mbytes/sec (7395737600 bytes in 9.999992 secs)

  0000:01:00.0 H2C channel 0 739.406 Mbytes/sec (7394033664 bytes in 9.999970 secs)
  0000:01:00.0 H2C channel 1 739.406 Mbytes/sec (7394033664 bytes in 9.999970 secs)
  0000:01:00.0 C2H channel 0 739.406 Mbytes/sec (7394099200 bytes in 10.000058 secs)
  0000:01:00.0 C2H channel 1 739.406 Mbytes/sec (7394099200 bytes in 10.000058 secs)

^C  0000:01:00.0 H2C channel 0 738.887 Mbytes/sec (5948309504 bytes in 8.050365 secs)
  0000:01:00.0 H2C channel 1 738.887 Mbytes/sec (5948309504 bytes in 8.050364 secs)
  0000:01:00.0 C2H channel 0 738.887 Mbytes/sec (5948309504 bytes in 8.050361 secs)
  0000:01:00.0 C2H channel 1 738.887 Mbytes/sec (5948309504 bytes in 8.050364 secs)

Overall test statistics:
  0000:01:00.0 H2C channel 0 739.319 Mbytes/sec (28131328000 bytes in 38.050325 secs)
  0000:01:00.0 H2C channel 1 739.319 Mbytes/sec (28131328000 bytes in 38.050330 secs)
  0000:01:00.0 C2H channel 0 739.319 Mbytes/sec (28131328000 bytes in 38.050337 secs)
  0000:01:00.0 C2H channel 1 739.319 Mbytes/sec (28131328000 bytes in 38.050334 secs)

Memory Driver closed 

Overall PASS

2.2. AS02MC04_dma_stream_crc64

Using a KU3P FPGA with a PCIe x8 8 GT/s interface.

The PC has a x16 5 GT/s slot, but as with other FPGA cards in the same PC only negotiates at 2.5 GT/s:

$ xilinx_dma_bridge_for_pcie/crc64_stream_latency 
Opening device 0000:01:00.0 (10ee:9038) with IOMMU group 0
Enabled bus master for 0000:01:00.0
Warning: Device device 0000:01:00.0 (10ee:9038) has reduced bandwidth
         Max width x8 speed 8 GT/s. Negotiated width x8 speed 2.5 GT/s
Testing design AS02MC04_dma_stream_crc64 using C2H 0 -> H2C 0
     32 len bytes latencies (us):   3.046 (50')   3.076 (75')   3.420 (99')  14.948 (99.999')
     64 len bytes latencies (us):   3.062 (50')   3.104 (75')   3.407 (99')  12.090 (99.999')
    128 len bytes latencies (us):   3.118 (50')   3.135 (75')   3.471 (99')  13.799 (99.999')
    256 len bytes latencies (us):   3.201 (50')   3.233 (75')   3.583 (99')  11.748 (99.999')
    512 len bytes latencies (us):   3.381 (50')   3.414 (75')   3.744 (99')  11.914 (99.999')
   1024 len bytes latencies (us):   3.684 (50')   3.713 (75')   4.064 (99')  13.981 (99.999')
   2048 len bytes latencies (us):   4.276 (50')   4.306 (75')   4.650 (99')  15.304 (99.999')
   4096 len bytes latencies (us):   5.483 (50')   5.520 (75')   5.868 (99')  13.547 (99.999')
   8192 len bytes latencies (us):   7.900 (50')   7.931 (75')   8.290 (99')  17.351 (99.999')
  16384 len bytes latencies (us):  12.760 (50')  12.779 (75')  13.185 (99')  22.093 (99.999')
  32768 len bytes latencies (us):  22.544 (50')  22.566 (75')  27.205 (99')  34.690 (99.999')
  65536 len bytes latencies (us):  42.114 (50')  42.142 (75')  46.866 (99')  57.835 (99.999')
 131072 len bytes latencies (us):  81.266 (50')  81.289 (75')  86.295 (99') 100.758 (99.999')
 262144 len bytes latencies (us): 159.558 (50') 159.591 (75') 169.072 (99') 196.323 (99.999')
 524288 len bytes latencies (us): 316.139 (50') 316.184 (75') 325.829 (99') 355.096 (99.999')
1048576 len bytes latencies (us): 629.395 (50') 629.427 (75') 639.369 (99') 713.138 (99.999')
Testing design AS02MC04_dma_stream_crc64 using C2H 1 -> H2C 1
     32 len bytes latencies (us):   3.037 (50')   3.057 (75')   3.252 (99')  10.568 (99.999')
     64 len bytes latencies (us):   3.057 (50')   3.076 (75')   3.255 (99')  10.401 (99.999')
    128 len bytes latencies (us):   3.113 (50')   3.125 (75')   3.300 (99')  12.866 (99.999')
    256 len bytes latencies (us):   3.199 (50')   3.217 (75')   3.396 (99')  10.475 (99.999')
    512 len bytes latencies (us):   3.379 (50')   3.409 (75')   3.607 (99')  11.553 (99.999')
   1024 len bytes latencies (us):   3.681 (50')   3.698 (75')   3.879 (99')  11.072 (99.999')
   2048 len bytes latencies (us):   4.273 (50')   4.294 (75')   4.485 (99')  17.169 (99.999')
   4096 len bytes latencies (us):   5.478 (50')   5.508 (75')   5.704 (99')  12.759 (99.999')
   8192 len bytes latencies (us):   7.889 (50')   7.916 (75')   8.085 (99')  16.294 (99.999')
  16384 len bytes latencies (us):  12.758 (50')  12.778 (75')  12.971 (99')  20.761 (99.999')
  32768 len bytes latencies (us):  22.544 (50')  22.561 (75')  23.540 (99')  31.890 (99.999')
  65536 len bytes latencies (us):  42.120 (50')  42.141 (75')  46.830 (99')  57.179 (99.999')
 131072 len bytes latencies (us):  81.269 (50')  81.289 (75')  86.039 (99')  97.373 (99.999')
 262144 len bytes latencies (us): 159.574 (50') 159.593 (75') 168.995 (99') 184.031 (99.999')
 524288 len bytes latencies (us): 316.153 (50') 316.204 (75') 327.365 (99') 370.028 (99.999')
1048576 len bytes latencies (us): 629.300 (50') 629.359 (75') 643.388 (99') 697.403 (99.999')
Testing design AS02MC04_dma_stream_crc64 using C2H 2 -> H2C 2
     32 len bytes latencies (us):   3.044 (50')   3.078 (75')   3.416 (99')  12.266 (99.999')
     64 len bytes latencies (us):   3.061 (50')   3.105 (75')   3.407 (99')  13.007 (99.999')
    128 len bytes latencies (us):   3.117 (50')   3.133 (75')   3.459 (99')  12.433 (99.999')
    256 len bytes latencies (us):   3.201 (50')   3.233 (75')   3.587 (99')  14.036 (99.999')
    512 len bytes latencies (us):   3.391 (50')   3.420 (75')   3.752 (99')  12.340 (99.999')
   1024 len bytes latencies (us):   3.685 (50')   3.714 (75')   4.065 (99')  12.693 (99.999')
   2048 len bytes latencies (us):   4.276 (50')   4.307 (75')   4.658 (99')  13.682 (99.999')
   4096 len bytes latencies (us):   5.485 (50')   5.522 (75')   5.878 (99')  18.405 (99.999')
   8192 len bytes latencies (us):   7.900 (50')   7.931 (75')   8.296 (99')  17.732 (99.999')
  16384 len bytes latencies (us):  12.759 (50')  12.779 (75')  13.186 (99')  27.734 (99.999')
  32768 len bytes latencies (us):  22.545 (50')  22.566 (75')  27.200 (99')  33.109 (99.999')
  65536 len bytes latencies (us):  42.114 (50')  42.142 (75')  46.864 (99')  56.402 (99.999')
 131072 len bytes latencies (us):  81.266 (50')  81.288 (75')  86.216 (99') 100.277 (99.999')
 262144 len bytes latencies (us): 159.558 (50') 159.581 (75') 169.051 (99') 188.466 (99.999')
 524288 len bytes latencies (us): 316.139 (50') 316.174 (75') 325.782 (99') 350.367 (99.999')
1048576 len bytes latencies (us): 629.300 (50') 629.363 (75') 640.719 (99') 683.584 (99.999')
Testing design AS02MC04_dma_stream_crc64 using C2H 3 -> H2C 3
     32 len bytes latencies (us):   3.045 (50')   3.077 (75')   3.420 (99')  11.921 (99.999')
     64 len bytes latencies (us):   3.060 (50')   3.104 (75')   3.408 (99')  14.886 (99.999')
    128 len bytes latencies (us):   3.117 (50')   3.134 (75')   3.468 (99')  11.736 (99.999')
    256 len bytes latencies (us):   3.201 (50')   3.233 (75')   3.586 (99')  12.580 (99.999')
    512 len bytes latencies (us):   3.392 (50')   3.421 (75')   3.748 (99')  13.028 (99.999')
   1024 len bytes latencies (us):   3.685 (50')   3.714 (75')   4.069 (99')  14.150 (99.999')
   2048 len bytes latencies (us):   4.276 (50')   4.307 (75')   4.660 (99')  15.386 (99.999')
   4096 len bytes latencies (us):   5.485 (50')   5.522 (75')   5.882 (99')  13.689 (99.999')
   8192 len bytes latencies (us):   7.900 (50')   7.931 (75')   8.301 (99')  16.661 (99.999')
  16384 len bytes latencies (us):  12.760 (50')  12.779 (75')  13.199 (99')  22.206 (99.999')
  32768 len bytes latencies (us):  22.545 (50')  22.566 (75')  27.220 (99')  34.849 (99.999')
  65536 len bytes latencies (us):  42.114 (50')  42.142 (75')  46.874 (99')  56.059 (99.999')
 131072 len bytes latencies (us):  81.266 (50')  81.288 (75')  86.301 (99') 101.221 (99.999')
 262144 len bytes latencies (us): 159.557 (50') 159.581 (75') 169.041 (99') 183.634 (99.999')
 524288 len bytes latencies (us): 316.139 (50') 316.174 (75') 325.768 (99') 352.888 (99.999')
1048576 len bytes latencies (us): 629.331 (50') 629.417 (75') 639.227 (99') 683.019 (99.999')

Start test_dma_bridge_independent_streams which uses a 2.5 GT/s speed:

$ xilinx_dma_bridge_for_pcie/test_dma_bridge_independent_streams --stream_mapping_size 020000000 
Opening device 0000:01:00.0 (10ee:9038) with IOMMU group 0
Enabled bus master for 0000:01:00.0
Warning: Device device 0000:01:00.0 (10ee:9038) has reduced bandwidth
         Max width x8 speed 8 GT/s. Negotiated width x8 speed 2.5 GT/s
Using num_descriptors=64 bytes_per_buffer=0x10000 data_mapping_size_words=0x100000
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 0 H2C channel 0
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 0 H2C channel 1
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 0 H2C channel 2
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 0 H2C channel 3
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 0 C2H channel 0
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 0 C2H channel 1
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 0 C2H channel 2
Selecting test of AS02MC04_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 0 C2H channel 3
Press Ctrl-C to stop test
  0000:01:00.0 H2C channel 0 416.270 Mbytes/sec (4162650112 bytes in 63517 transfers over 9.999874 secs)
  0000:01:00.0 H2C channel 1 416.269 Mbytes/sec (4162650112 bytes in 63517 transfers over 9.999893 secs)
  0000:01:00.0 H2C channel 2 416.269 Mbytes/sec (4162650112 bytes in 63517 transfers over 9.999901 secs)
  0000:01:00.0 H2C channel 3 416.269 Mbytes/sec (4162650112 bytes in 63517 transfers over 9.999901 secs)
  0000:01:00.0 C2H channel 0 0.051 Mbytes/sec (508136 bytes in 63517 transfers over 9.999873 secs)
  0000:01:00.0 C2H channel 1 0.051 Mbytes/sec (508136 bytes in 63517 transfers over 9.999892 secs)
  0000:01:00.0 C2H channel 2 0.051 Mbytes/sec (508136 bytes in 63517 transfers over 9.999900 secs)
  0000:01:00.0 C2H channel 3 0.051 Mbytes/sec (508136 bytes in 63517 transfers over 9.999900 secs)

  0000:01:00.0 H2C channel 0 416.472 Mbytes/sec (4164747264 bytes in 63549 transfers over 10.000071 secs)
  0000:01:00.0 H2C channel 1 416.472 Mbytes/sec (4164747264 bytes in 63549 transfers over 10.000071 secs)
  0000:01:00.0 H2C channel 2 416.472 Mbytes/sec (4164747264 bytes in 63549 transfers over 10.000071 secs)
  0000:01:00.0 H2C channel 3 416.472 Mbytes/sec (4164747264 bytes in 63549 transfers over 10.000071 secs)
  0000:01:00.0 C2H channel 0 0.051 Mbytes/sec (508392 bytes in 63549 transfers over 10.000071 secs)
  0000:01:00.0 C2H channel 1 0.051 Mbytes/sec (508392 bytes in 63549 transfers over 10.000071 secs)
  0000:01:00.0 C2H channel 2 0.051 Mbytes/sec (508392 bytes in 63549 transfers over 10.000071 secs)
  0000:01:00.0 C2H channel 3 0.051 Mbytes/sec (508392 bytes in 63549 transfers over 10.000071 secs)

Retrain the link speed, which increases from 2.5 GT/s to 5 GT/s:

$ sudo dump_info/pcie_set_speed_libpciaccess 0000:01:00.0 2
Operating on device 0000:00:01.0 vendor_id=8086 (Intel Corporation) device_id=0101 (Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port) revision_id=09
Link capabilities: 02212D02 Max link speed 5 GT/s max link width x16
Link status: 5081
Current link speed: 2.5 GT/s
Original link control 2: 0002
Original target link speed: 2 (5 GT/s)
New target link speed: 2 (5 GT/s)
New link control 2: 0002
Triggering link retraining by changing link control 0040 -> 0060
Link status: 5082
Current link speed: 5 GT/s

And the test throughput increases:

  0000:01:00.0 H2C channel 0 832.530 Mbytes/sec (8325300224 bytes in 127034 transfers over 10.000002 secs)
  0000:01:00.0 H2C channel 1 832.518 Mbytes/sec (8325169152 bytes in 127032 transfers over 9.999990 secs)
  0000:01:00.0 H2C channel 2 832.518 Mbytes/sec (8325169152 bytes in 127032 transfers over 9.999990 secs)
  0000:01:00.0 H2C channel 3 832.519 Mbytes/sec (8325234688 bytes in 127033 transfers over 10.000053 secs)
  0000:01:00.0 C2H channel 0 0.102 Mbytes/sec (1016272 bytes in 127034 transfers over 10.000002 secs)
  0000:01:00.0 C2H channel 1 0.102 Mbytes/sec (1016256 bytes in 127032 transfers over 9.999990 secs)
  0000:01:00.0 C2H channel 2 0.102 Mbytes/sec (1016256 bytes in 127032 transfers over 9.999990 secs)
  0000:01:00.0 C2H channel 3 0.102 Mbytes/sec (1016264 bytes in 127033 transfers over 10.000052 secs)

  0000:01:00.0 H2C channel 0 837.211 Mbytes/sec (8372158464 bytes in 127749 transfers over 10.000056 secs)
  0000:01:00.0 H2C channel 1 837.211 Mbytes/sec (8372158464 bytes in 127749 transfers over 10.000057 secs)
  0000:01:00.0 H2C channel 2 837.211 Mbytes/sec (8372158464 bytes in 127749 transfers over 10.000057 secs)
  0000:01:00.0 H2C channel 3 837.211 Mbytes/sec (8372092928 bytes in 127748 transfers over 9.999979 secs)
  0000:01:00.0 C2H channel 0 0.102 Mbytes/sec (1021992 bytes in 127749 transfers over 10.000056 secs)
  0000:01:00.0 C2H channel 1 0.102 Mbytes/sec (1021992 bytes in 127749 transfers over 10.000057 secs)
  0000:01:00.0 C2H channel 2 0.102 Mbytes/sec (1021992 bytes in 127749 transfers over 10.000057 secs)
  0000:01:00.0 C2H channel 3 0.102 Mbytes/sec (1021984 bytes in 127748 transfers over 9.999979 secs)

3. HP Z6 G4

An HP Z6 G4 with dual Intel(R) Xeon(R) Gold 6148 CPUs.

3.1. TOSING_160T_dma_stream_loopback

Booted openSUSE Leap 15.5 from a SD Card. The same FPGA card and bitstream as used in the Intel DH67BL motherboard. In this PC the FPGA PCIe interface operates at the expected x4 5 GT/s:

linux@DESKTOP-BVUMP11:~/fpga_sio/software_tests/eclipse_project/bin/release> xilinx_dma_bridge_for_pcie/test_dma_bridge_parallel_streams 
Opening device 0000:01:00.0 (10ee:7024) with IOMMU group 23
Enabled bus master for 0000:01:00.0
Using num_descriptors=64 bytes_per_buffer=0x1000000 data_mapping_size_words=0x10000000
Device 0000:01:00.0 design TOSING_160T_dma_stream_loopback routes updated
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 23 H2C channel 1 C2H channel 0
Selecting test of TOSING_160T_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 23 H2C channel 0 C2H channel 1
Press Ctrl-C to stop test
  0000:01:00.0 1 -> 0 742.314 Mbytes/sec (7415529472 bytes in 9.989745 secs)
  0000:01:00.0 0 -> 1 742.314 Mbytes/sec (7415529472 bytes in 9.989750 secs)

  0000:01:00.0 1 -> 0 742.317 Mbytes/sec (7415529472 bytes in 9.989714 secs)
  0000:01:00.0 0 -> 1 742.317 Mbytes/sec (7415529472 bytes in 9.989714 secs)

  0000:01:00.0 1 -> 0 742.317 Mbytes/sec (7432306688 bytes in 10.012315 secs)
  0000:01:00.0 0 -> 1 742.317 Mbytes/sec (7432306688 bytes in 10.012315 secs)

  0000:01:00.0 1 -> 0 742.317 Mbytes/sec (7415529472 bytes in 9.989713 secs)
  0000:01:00.0 0 -> 1 742.317 Mbytes/sec (7415529472 bytes in 9.989714 secs)

^C  0000:01:00.0 1 -> 0 742.317 Mbytes/sec (4898947072 bytes in 6.599539 secs)
  0000:01:00.0 0 -> 1 742.317 Mbytes/sec (4898947072 bytes in 6.599535 secs)

Overall test statistics:
  0000:01:00.0 1 -> 0 742.317 Mbytes/sec (34577842176 bytes in 46.580982 secs)
  0000:01:00.0 0 -> 1 742.317 Mbytes/sec (34577842176 bytes in 46.580984 secs)

0000:01:00.0 1 -> 0 Test pattern verified in 268435456 words
0000:01:00.0 0 -> 1 Test pattern verified in 268435456 words

Overall PASS

A dump of the PCIe information while the test is running:

linux@DESKTOP-BVUMP11:~/fpga_sio/software_tests/eclipse_project/bin/release> dump_info/dump_pci_info_pciutils 
domain=0000 bus=01 dev=00 func=00 rev=01
  vendor_id=10ee (Xilinx Corporation) device_id=7024 (Device 7024) subvendor_id=0002 subdevice_id=0009
  iommu_group=23
  driver=vfio-pci
  control: I/O- Mem+ BusMaster+ ParErr+ SERR+ DisINTx-
  status: INTx- <ParErr- >TAbort- <TAbort- <MAbort- >SERR- DetParErr-
  bar[0] base_addr=94300000 size=100000 is_IO=0 is_prefetchable=0 is_64=0
  bar[2] base_addr=94400000 size=10000 is_IO=0 is_prefetchable=0 is_64=1
  Capabilities: [40] Power Management
  Capabilities: [48] Message Signaled Interrupts
  Capabilities: [60] PCI Express v2 Express Endpoint, MSI 0
    Link capabilities: Max speed 5 GT/s Max width x4
    Negotiated link status: Current speed 5 GT/s Width x4
    Link capabilities2: Not implemented
    DevCap: MaxPayload 512 bytes PhantFunc 0 Latency L0s Maximum of 64 ns L1 No limit
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
    DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq-
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
    DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
    LnkCap: Port # 0 ASPM L0s
            L0s Exit Latency More than 4 μs
            L1 Exit Latency More than 64 μs
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
    LnkCtl: ASPM Disabled RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- ABWMgmt-
    LnkSta: TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
  domain=0000 bus=00 dev=1b func=00 rev=f9
    vendor_id=8086 (Intel Corporation) device_id=a1e7 (C620 Series Chipset Family PCI Express Root Port #17)
    iommu_group=18
    driver=pcieport
    physical_slot=3
    control: I/O+ Mem+ BusMaster+ ParErr+ SERR+ DisINTx+
    status: INTx- <ParErr- >TAbort- <TAbort- <MAbort- >SERR- DetParErr-
    Capabilities: [40] PCI Express v2 Root Port, MSI 0
      Link capabilities: Max speed 8 GT/s Max width x4
      Negotiated link status: Current speed 5 GT/s Width x4
      Link capabilities2: Supported link speeds 2.5 GT/s 5.0 GT/s 8.0 GT/s
      DevCap: MaxPayload 128 bytes PhantFunc 0 Latency L0s Maximum of 64 ns L1 Maximum of 1 μs
              ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
      DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq-
              RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
      DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
      LnkCap: Port # 17 ASPM not supported
              L0s Exit Latency 512 ns to less than 1 μs
              L1 Exit Latency 8 μs to less than 16 μs
              ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
      LnkCtl: ASPM Disabled RCB 64 bytes Disabled- CommClk+
              ExtSynch- ClockPM- AutWidDis- BWInt- ABWMgmt-
      LnkSta: TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
      SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
              Slot #3 PowerLimit 25.000W Interlock- NoCompl+
    Capabilities: [80] Message Signaled Interrupts
    Capabilities: [90] Bridge subsystem vendor/device ID
    Capabilities: [a0] Power Management

3.2. TOSING_160T_dma_ddr3

Run without error:

linux@DESKTOP-BVUMP11:~/fpga_sio/software_tests/eclipse_project/bin/release> xilinx_dma_bridge_for_pcie/test_dma_bridge
Opening device 0000:01:00.0 (10ee:7024) with IOMMU group 23
Enabled bus master for 0000:01:00.0
Testing TOSING_160T_dma_ddr3 design with memory size 0x40000000
PCI device 0000:01:00.0 IOMMU group 23

Testing using 8 buffers of size 0x8000000 bytes, H2C channel 0 C2H channel 0
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3636.431944 (Mbytes/sec)
  Mean = 3645.796235 (Mbytes/sec)
   Max = 3653.293751 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 945.373122 (Mbytes/sec)
  Mean = 945.417685 (Mbytes/sec)
   Max = 945.449435 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3603.803932 (Mbytes/sec)
  Mean = 3624.287538 (Mbytes/sec)
   Max = 3644.648570 (Mbytes/sec)
TEST PASS

Testing using 8 buffers of size 0x8000000 bytes, H2C channel 0 C2H channel 1
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3641.184135 (Mbytes/sec)
  Mean = 3647.979870 (Mbytes/sec)
   Max = 3653.243074 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 945.375525 (Mbytes/sec)
  Mean = 945.424984 (Mbytes/sec)
   Max = 945.452615 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3624.245370 (Mbytes/sec)
  Mean = 3637.930887 (Mbytes/sec)
   Max = 3649.101458 (Mbytes/sec)
TEST PASS

Testing using 8 buffers of size 0x8000000 bytes, H2C channel 1 C2H channel 0
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3643.943919 (Mbytes/sec)
  Mean = 3649.734438 (Mbytes/sec)
   Max = 3659.238510 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 945.392312 (Mbytes/sec)
  Mean = 945.426304 (Mbytes/sec)
   Max = 945.450170 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3633.495611 (Mbytes/sec)
  Mean = 3640.852926 (Mbytes/sec)
   Max = 3647.119112 (Mbytes/sec)
TEST PASS

Testing using 8 buffers of size 0x8000000 bytes, H2C channel 1 C2H channel 1
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3635.526317 (Mbytes/sec)
  Mean = 3648.811077 (Mbytes/sec)
   Max = 3658.089982 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 945.398097 (Mbytes/sec)
  Mean = 945.429439 (Mbytes/sec)
   Max = 945.455510 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3632.779141 (Mbytes/sec)
  Mean = 3642.108668 (Mbytes/sec)
   Max = 3652.501426 (Mbytes/sec)
TEST PASS
Testing TOSING_160T_dma_ddr3 design with memory size 0x40000000
PCI device 0000:01:00.0 IOMMU group 23

Testing using:
  H2C channel 0 transfer length 0x10000000 bytes with 8 descriptors
  C2H channel 0 transfer length 0x10000000 bytes with 8 descriptors
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3643.670914 (Mbytes/sec)
  Mean = 3650.617295 (Mbytes/sec)
   Max = 3658.552602 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 906.966307 (Mbytes/sec)
  Mean = 907.009465 (Mbytes/sec)
   Max = 907.036223 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3594.302758 (Mbytes/sec)
  Mean = 3602.093865 (Mbytes/sec)
   Max = 3604.951140 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 0 transfer length 0x10000000 bytes with 8 descriptors
  C2H channel 1 transfer length 0x10000000 bytes with 8 descriptors
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3644.902976 (Mbytes/sec)
  Mean = 3650.843228 (Mbytes/sec)
   Max = 3657.502315 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 906.974082 (Mbytes/sec)
  Mean = 907.010500 (Mbytes/sec)
   Max = 907.028802 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3594.450166 (Mbytes/sec)
  Mean = 3602.873753 (Mbytes/sec)
   Max = 3605.441143 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 1 transfer length 0x10000000 bytes with 8 descriptors
  C2H channel 0 transfer length 0x10000000 bytes with 8 descriptors
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3645.870374 (Mbytes/sec)
  Mean = 3652.490617 (Mbytes/sec)
   Max = 3658.809515 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 906.962397 (Mbytes/sec)
  Mean = 907.013354 (Mbytes/sec)
   Max = 907.033871 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3593.929644 (Mbytes/sec)
  Mean = 3603.571980 (Mbytes/sec)
   Max = 3605.700360 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 1 transfer length 0x10000000 bytes with 8 descriptors
  C2H channel 1 transfer length 0x10000000 bytes with 8 descriptors
populate test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3645.345015 (Mbytes/sec)
  Mean = 3651.430892 (Mbytes/sec)
   Max = 3657.803938 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 16 transfers of 1073741824 bytes:
   Min = 906.984469 (Mbytes/sec)
  Mean = 907.016453 (Mbytes/sec)
   Max = 907.030472 (Mbytes/sec)
verify test pattern timing for 16 transfers of 1073741824 bytes:
   Min = 3593.201371 (Mbytes/sec)
  Mean = 3602.478031 (Mbytes/sec)
   Max = 3605.229510 (Mbytes/sec)
TEST PASS

Overall PASS

3.4. TOSING_160T_dma_stream_crc64

Compiled for release:

linux@DESKTOP-BVUMP11:~/fpga_sio/software_tests/eclipse_project/bin/release> xilinx_dma_bridge_for_pcie/crc64_stream_latency 
Opening device 0000:01:00.0 (10ee:7024) with IOMMU group 23
Enabled bus master for 0000:01:00.0
Testing design TOSING_160T_dma_stream_crc64 using C2H 0 -> H2C 0
     32 len bytes latencies (us):   5.059 (50')   5.114 (75')   5.885 (99')  20.386 (99.999')
     64 len bytes latencies (us):   5.157 (50')   5.167 (75')   5.245 (99')  10.153 (99.999')
    128 len bytes latencies (us):   5.294 (50')   5.305 (75')   5.407 (99')  10.392 (99.999')
    256 len bytes latencies (us):   5.426 (50')   5.436 (75')   5.733 (99')  19.909 (99.999')
    512 len bytes latencies (us):   5.711 (50')   5.721 (75')   6.021 (99')  10.375 (99.999')
   1024 len bytes latencies (us):   6.005 (50')   6.015 (75')   6.343 (99')  10.865 (99.999')
   2048 len bytes latencies (us):   6.602 (50')   6.617 (75')   6.883 (99')  13.840 (99.999')
   4096 len bytes latencies (us):   7.794 (50')   7.804 (75')   8.112 (99')  12.016 (99.999')
   8192 len bytes latencies (us):  10.201 (50')  10.232 (75')  10.441 (99')  14.134 (99.999')
  16384 len bytes latencies (us):  15.082 (50')  15.104 (75')  15.570 (99')  19.386 (99.999')
  32768 len bytes latencies (us):  24.870 (50')  24.884 (75')  25.204 (99')  34.752 (99.999')
  65536 len bytes latencies (us):  44.452 (50')  44.470 (75')  44.761 (99')  58.418 (99.999')
 131072 len bytes latencies (us):  83.608 (50')  83.621 (75')  83.858 (99')  94.681 (99.999')
 262144 len bytes latencies (us): 161.932 (50') 162.108 (75') 162.293 (99') 172.659 (99.999')
 524288 len bytes latencies (us): 319.061 (50') 319.221 (75') 319.558 (99') 339.994 (99.999')
1048576 len bytes latencies (us): 632.294 (50') 632.437 (75') 632.759 (99') 647.254 (99.999')
Testing design TOSING_160T_dma_stream_crc64 using C2H 1 -> H2C 1
     32 len bytes latencies (us):   5.048 (50')   5.056 (75')   5.137 (99')  19.488 (99.999')
     64 len bytes latencies (us):   5.139 (50')   5.149 (75')   5.475 (99')   9.008 (99.999')
    128 len bytes latencies (us):   5.284 (50')   5.299 (75')   5.376 (99')  14.180 (99.999')
    256 len bytes latencies (us):   5.422 (50')   5.432 (75')   5.739 (99')   9.918 (99.999')
    512 len bytes latencies (us):   5.707 (50')   5.716 (75')   5.887 (99')  19.630 (99.999')
   1024 len bytes latencies (us):   5.991 (50')   6.003 (75')   6.308 (99')  10.111 (99.999')
   2048 len bytes latencies (us):   6.607 (50')   6.621 (75')   6.828 (99')  12.205 (99.999')
   4096 len bytes latencies (us):   7.785 (50')   7.796 (75')   8.115 (99')  12.368 (99.999')
   8192 len bytes latencies (us):  10.187 (50')  10.322 (75')  10.375 (99')  14.641 (99.999')
  16384 len bytes latencies (us):  15.084 (50')  15.099 (75')  15.440 (99')  27.632 (99.999')
  32768 len bytes latencies (us):  24.879 (50')  24.892 (75')  25.141 (99')  31.092 (99.999')
  65536 len bytes latencies (us):  44.440 (50')  44.451 (75')  44.772 (99')  50.885 (99.999')
 131072 len bytes latencies (us):  83.605 (50')  83.620 (75')  83.819 (99')  94.164 (99.999')
 262144 len bytes latencies (us): 161.893 (50') 161.918 (75') 162.235 (99') 172.646 (99.999')
 524288 len bytes latencies (us): 318.978 (50') 318.995 (75') 319.287 (99') 329.869 (99.999')
1048576 len bytes latencies (us): 632.264 (50') 632.290 (75') 632.614 (99') 647.431 (99.999')

3.5. TOSING_160T_dma_stream_crc64 and AS02MC04_dma_stream_crc64

linux@DESKTOP-BVUMP11:~/fpga_sio/software_tests/eclipse_project/bin/release> xilinx_dma_bridge_for_pcie/crc64_stream_latency -t
Opening device 0000:19:00.0 (10ee:9038) with IOMMU group 22
Enabled bus master for 0000:19:00.0
Warning: Device device 0000:19:00.0 (10ee:9038) has reduced bandwidth
         Max width x8 speed 8 GT/s. Negotiated width x4 speed 8 GT/s
Opening device 0000:31:00.0 (10ee:7024) with IOMMU group 81
Enabled bus master for 0000:31:00.0
Testing design AS02MC04_dma_stream_crc64 using C2H 0 -> H2C 0
Current temperature  56.7C  min  54.7C  max  59.1C
     32 len bytes latencies (us):   3.230 (50')   3.383 (75')   4.056 (99')  18.057 (99.999')
     64 len bytes latencies (us):   3.226 (50')   3.236 (75')   3.332 (99')  11.769 (99.999')
    128 len bytes latencies (us):   3.264 (50')   3.274 (75')   3.357 (99')   9.758 (99.999')
    256 len bytes latencies (us):   3.363 (50')   3.375 (75')   3.678 (99')  11.329 (99.999')
    512 len bytes latencies (us):   3.444 (50')   3.458 (75')   3.659 (99')   6.599 (99.999')
   1024 len bytes latencies (us):   3.603 (50')   3.618 (75')   3.842 (99')   8.034 (99.999')
   2048 len bytes latencies (us):   3.908 (50')   3.919 (75')   4.184 (99')   8.135 (99.999')
   4096 len bytes latencies (us):   4.479 (50')   4.524 (75')   4.677 (99')  11.463 (99.999')
   8192 len bytes latencies (us):   5.612 (50')   5.628 (75')   5.901 (99')   8.794 (99.999')
  16384 len bytes latencies (us):   7.953 (50')   7.967 (75')   8.270 (99')  21.496 (99.999')
  32768 len bytes latencies (us):  12.670 (50')  12.689 (75')  13.030 (99')  24.444 (99.999')
  65536 len bytes latencies (us):  22.258 (50')  22.276 (75')  22.433 (99')  33.158 (99.999')
 131072 len bytes latencies (us):  41.116 (50')  41.136 (75')  41.624 (99')  51.975 (99.999')
 262144 len bytes latencies (us):  78.795 (50')  78.836 (75')  79.154 (99')  89.810 (99.999')
 524288 len bytes latencies (us): 154.959 (50') 154.996 (75') 155.324 (99') 165.787 (99.999')
1048576 len bytes latencies (us): 305.702 (50') 305.737 (75') 306.107 (99') 321.112 (99.999')
Current temperature  57.6C  min  54.7C  max  61.1C
Testing design AS02MC04_dma_stream_crc64 using C2H 1 -> H2C 1
     32 len bytes latencies (us):   3.227 (50')   3.239 (75')   3.331 (99')   6.793 (99.999')
     64 len bytes latencies (us):   3.257 (50')   3.266 (75')   3.347 (99')   7.949 (99.999')
    128 len bytes latencies (us):   3.271 (50')   3.280 (75')   3.423 (99')  13.690 (99.999')
    256 len bytes latencies (us):   3.356 (50')   3.364 (75')   3.390 (99')   6.530 (99.999')
    512 len bytes latencies (us):   3.473 (50')   3.486 (75')   3.684 (99')   7.457 (99.999')
   1024 len bytes latencies (us):   3.578 (50')   3.589 (75')   3.854 (99')  19.509 (99.999')
   2048 len bytes latencies (us):   3.898 (50')   3.911 (75')   4.185 (99')   8.381 (99.999')
   4096 len bytes latencies (us):   4.471 (50')   4.482 (75')   4.545 (99')   9.174 (99.999')
   8192 len bytes latencies (us):   5.608 (50')   5.626 (75')   5.721 (99')   9.836 (99.999')
  16384 len bytes latencies (us):   7.947 (50')   7.965 (75')   8.274 (99')  12.931 (99.999')
  32768 len bytes latencies (us):  12.668 (50')  12.686 (75')  13.005 (99')  16.671 (99.999')
  65536 len bytes latencies (us):  22.259 (50')  22.278 (75')  22.545 (99')  35.239 (99.999')
 131072 len bytes latencies (us):  41.119 (50')  41.140 (75')  41.434 (99')  55.031 (99.999')
 262144 len bytes latencies (us):  78.790 (50')  78.815 (75')  79.143 (99')  91.243 (99.999')
 524288 len bytes latencies (us): 154.912 (50') 154.935 (75') 155.244 (99') 166.452 (99.999')
1048576 len bytes latencies (us): 305.740 (50') 305.776 (75') 306.365 (99') 324.149 (99.999')
Current temperature  59.1C  min  54.7C  max  62.1C
Testing design AS02MC04_dma_stream_crc64 using C2H 2 -> H2C 2
     32 len bytes latencies (us):   3.221 (50')   3.232 (75')   3.323 (99')   8.385 (99.999')
     64 len bytes latencies (us):   3.252 (50')   3.264 (75')   3.483 (99')   8.017 (99.999')
    128 len bytes latencies (us):   3.262 (50')   3.272 (75')   3.438 (99')   8.359 (99.999')
    256 len bytes latencies (us):   3.354 (50')   3.363 (75')   3.473 (99')   8.169 (99.999')
    512 len bytes latencies (us):   3.458 (50')   3.469 (75')   3.580 (99')   7.264 (99.999')
   1024 len bytes latencies (us):   3.611 (50')   3.880 (75')   3.909 (99')   8.476 (99.999')
   2048 len bytes latencies (us):   3.870 (50')   3.884 (75')   4.197 (99')   8.384 (99.999')
   4096 len bytes latencies (us):   4.474 (50')   4.502 (75')   4.646 (99')   9.141 (99.999')
   8192 len bytes latencies (us):   5.609 (50')   5.626 (75')   5.882 (99')  10.021 (99.999')
  16384 len bytes latencies (us):   7.948 (50')   7.962 (75')   8.203 (99')  15.118 (99.999')
  32768 len bytes latencies (us):  12.651 (50')  12.674 (75')  13.091 (99')  17.602 (99.999')
  65536 len bytes latencies (us):  22.072 (50')  22.091 (75')  22.276 (99')  41.478 (99.999')
 131072 len bytes latencies (us):  40.942 (50')  40.973 (75')  41.187 (99')  51.543 (99.999')
 262144 len bytes latencies (us):  78.654 (50')  78.674 (75')  79.091 (99')  89.457 (99.999')
 524288 len bytes latencies (us): 154.634 (50') 154.722 (75') 154.994 (99') 165.519 (99.999')
1048576 len bytes latencies (us): 305.601 (50') 305.741 (75') 306.168 (99') 325.180 (99.999')
Current temperature  60.6C  min  54.7C  max  62.1C
Testing design AS02MC04_dma_stream_crc64 using C2H 3 -> H2C 3
     32 len bytes latencies (us):   3.222 (50')   3.234 (75')   3.338 (99')   8.145 (99.999')
     64 len bytes latencies (us):   3.235 (50')   3.246 (75')   3.342 (99')   8.368 (99.999')
    128 len bytes latencies (us):   3.268 (50')   3.276 (75')   3.348 (99')  13.395 (99.999')
    256 len bytes latencies (us):   3.345 (50')   3.355 (75')   3.460 (99')  18.914 (99.999')
    512 len bytes latencies (us):   3.458 (50')   3.470 (75')   3.591 (99')   7.201 (99.999')
   1024 len bytes latencies (us):   3.600 (50')   3.620 (75')   3.882 (99')   7.927 (99.999')
   2048 len bytes latencies (us):   3.883 (50')   3.895 (75')   4.211 (99')   8.415 (99.999')
   4096 len bytes latencies (us):   4.466 (50')   4.515 (75')   4.691 (99')   9.115 (99.999')
   8192 len bytes latencies (us):   5.608 (50')   5.623 (75')   5.935 (99')   9.690 (99.999')
  16384 len bytes latencies (us):   7.951 (50')   7.964 (75')   8.244 (99')  15.129 (99.999')
  32768 len bytes latencies (us):  12.658 (50')  12.678 (75')  13.006 (99')  27.851 (99.999')
  65536 len bytes latencies (us):  22.135 (50')  22.277 (75')  22.518 (99')  32.890 (99.999')
 131072 len bytes latencies (us):  40.987 (50')  41.126 (75')  41.465 (99')  55.978 (99.999')
 262144 len bytes latencies (us):  78.687 (50')  78.812 (75')  79.072 (99')  89.733 (99.999')
 524288 len bytes latencies (us): 154.939 (50') 154.969 (75') 155.310 (99') 171.417 (99.999')
1048576 len bytes latencies (us): 305.642 (50') 305.764 (75') 306.113 (99') 330.874 (99.999')
Current temperature  60.1C  min  54.7C  max  62.6C
Testing design TOSING_160T_dma_stream_crc64 using C2H 0 -> H2C 0
Current temperature  33.7C  min  32.1C  max  35.6C
     32 len bytes latencies (us):   3.862 (50')   3.879 (75')   4.143 (99')  10.559 (99.999')
     64 len bytes latencies (us):   3.904 (50')   3.914 (75')   4.105 (99')   9.736 (99.999')
    128 len bytes latencies (us):   3.999 (50')   4.011 (75')   4.320 (99')   8.736 (99.999')
    256 len bytes latencies (us):   4.207 (50')   4.223 (75')   4.463 (99')  17.638 (99.999')
    512 len bytes latencies (us):   4.480 (50')   4.503 (75')   4.649 (99')  12.634 (99.999')
   1024 len bytes latencies (us):   4.762 (50')   4.780 (75')   4.931 (99')  11.878 (99.999')
   2048 len bytes latencies (us):   5.323 (50')   5.336 (75')   5.553 (99')  10.857 (99.999')
   4096 len bytes latencies (us):   6.453 (50')   6.466 (75')   6.634 (99')  12.906 (99.999')
   8192 len bytes latencies (us):   8.753 (50')   8.768 (75')   8.880 (99')  12.980 (99.999')
  16384 len bytes latencies (us):  13.379 (50')  13.541 (75')  13.652 (99')  26.379 (99.999')
  32768 len bytes latencies (us):  22.570 (50')  22.691 (75')  22.989 (99')  37.643 (99.999')
  65536 len bytes latencies (us):  41.065 (50')  41.137 (75')  41.451 (99')  53.200 (99.999')
 131072 len bytes latencies (us):  77.872 (50')  77.973 (75')  78.147 (99')  87.528 (99.999')
 262144 len bytes latencies (us): 151.579 (50') 151.645 (75') 151.966 (99') 164.633 (99.999')
 524288 len bytes latencies (us): 299.602 (50') 299.715 (75') 300.212 (99') 313.089 (99.999')
1048576 len bytes latencies (us): 593.631 (50') 593.876 (75') 594.197 (99') 617.836 (99.999')
Current temperature  34.9C  min  32.1C  max  36.9C
Testing design TOSING_160T_dma_stream_crc64 using C2H 1 -> H2C 1
     32 len bytes latencies (us):   3.853 (50')   3.868 (75')   4.141 (99')  11.316 (99.999')
     64 len bytes latencies (us):   3.918 (50')   3.929 (75')   4.134 (99')   7.988 (99.999')
    128 len bytes latencies (us):   4.004 (50')   4.016 (75')   4.334 (99')   8.294 (99.999')
    256 len bytes latencies (us):   4.236 (50')   4.245 (75')   4.558 (99')  20.139 (99.999')
    512 len bytes latencies (us):   4.491 (50')   4.512 (75')   4.638 (99')  11.483 (99.999')
   1024 len bytes latencies (us):   4.762 (50')   4.775 (75')   4.907 (99')  14.466 (99.999')
   2048 len bytes latencies (us):   5.333 (50')   5.355 (75')   5.608 (99')  12.250 (99.999')
   4096 len bytes latencies (us):   6.487 (50')   6.503 (75')   6.650 (99')  20.028 (99.999')
   8192 len bytes latencies (us):   8.743 (50')   8.762 (75')   8.978 (99')  13.734 (99.999')
  16384 len bytes latencies (us):  13.361 (50')  13.375 (75')  13.676 (99')  27.731 (99.999')
  32768 len bytes latencies (us):  22.587 (50')  22.610 (75')  22.929 (99')  40.622 (99.999')
  65536 len bytes latencies (us):  40.972 (50')  41.063 (75')  41.395 (99')  58.848 (99.999')
 131072 len bytes latencies (us):  77.871 (50')  77.959 (75')  78.188 (99')  90.358 (99.999')
 262144 len bytes latencies (us): 151.541 (50') 151.652 (75') 152.033 (99') 171.533 (99.999')
 524288 len bytes latencies (us): 299.599 (50') 299.824 (75') 300.223 (99') 314.377 (99.999')
1048576 len bytes latencies (us): 593.715 (50') 593.893 (75') 594.167 (99') 609.446 (99.999')
Current temperature  34.9C  min  32.1C  max  37.0C

The AS02MC04 is in slot 4. The VFIO debug output shows the AS02MC04 is only operating at x4 width. The dump_pci_info_pciutils output shows the PCIe root port maximum width is x4:

linux@DESKTOP-BVUMP11:~/fpga_sio/software_tests/eclipse_project/bin/release> dump_info/dump_pci_info_pciutils 
domain=0000 bus=0d dev=00 func=00 rev=00
  vendor_id=10ee (Xilinx Corporation) device_id=9038 (Device 9038) subvendor_id=0002 subdevice_id=001a
  iommu_group=22
  driver=vfio-pci
  physical_slot=4-3
  control: I/O- Mem+ BusMaster- ParErr- SERR+ DisINTx-
  status: INTx- <ParErr- >TAbort- <TAbort- <MAbort- >SERR- DetParErr-
  bar[0] base_addr=96200000 size=4000 is_IO=0 is_prefetchable=0 is_64=1
  bar[2] base_addr=387f60000000 size=10000 is_IO=0 is_prefetchable=1 is_64=1
  Capabilities: [40] Power Management
  Capabilities: [48] Message Signaled Interrupts
  Capabilities: [70] PCI Express v2 Express Endpoint, MSI 0
    Link capabilities: Max speed 8 GT/s Max width x8
    Negotiated link status: Current speed 8 GT/s Width x4
    Link capabilities2: Supported link speeds 2.5 GT/s 5.0 GT/s 8.0 GT/s
    DevCap: MaxPayload 1024 bytes PhantFunc 0 Latency L0s Maximum of 64 ns L1 Maximum of 1 μs
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
    DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
    DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
    LnkCap: Port # 0 ASPM not supported
            L0s Exit Latency More than 4 μs
            L1 Exit Latency More than 64 μs
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
    LnkCtl: ASPM Disabled RCB 64 bytes Disabled- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- ABWMgmt-
    LnkSta: TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
  domain=0000 bus=0c dev=00 func=00 rev=04
    vendor_id=8086 (Intel Corporation) device_id=2030 (Sky Lake-E PCI Express Root Port A)
    iommu_group=32
    driver=pcieport
    physical_slot=4
    control: I/O+ Mem+ BusMaster+ ParErr- SERR- DisINTx+
    status: INTx- <ParErr- >TAbort- <TAbort- <MAbort- >SERR- DetParErr-
    Capabilities: [40] Bridge subsystem vendor/device ID
    Capabilities: [60] Message Signaled Interrupts
    Capabilities: [90] PCI Express v2 Root Port, MSI 0
      Link capabilities: Max speed 8 GT/s Max width x4
      Negotiated link status: Current speed 8 GT/s Width x4
      Link capabilities2: Supported link speeds 2.5 GT/s 5.0 GT/s 8.0 GT/s
      DevCap: MaxPayload 256 bytes PhantFunc 0 Latency L0s Maximum of 64 ns L1 Maximum of 1 μs
              ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
      DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
              RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
      DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
      LnkCap: Port # 1 ASPM not supported
              L0s Exit Latency 256 ns to less than 512 ns
              L1 Exit Latency 8 μs to less than 16 μs
              ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
      LnkCtl: ASPM Disabled RCB 64 bytes Disabled- CommClk+
              ExtSynch- ClockPM- AutWidDis- BWInt- ABWMgmt-
      LnkSta: TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt+
      SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise-
              Slot #4 PowerLimit 0.000W Interlock- NoCompl-
    Capabilities: [e0] Power Management

domain=0000 bus=26 dev=00 func=00 rev=00
  vendor_id=10ee (Xilinx Corporation) device_id=7024 (Device 7024) subvendor_id=0002 subdevice_id=0018
  iommu_group=81
  driver=vfio-pci
  physical_slot=2-2
  control: I/O- Mem+ BusMaster- ParErr+ SERR+ DisINTx-
  status: INTx- <ParErr- >TAbort- <TAbort- <MAbort- >SERR- DetParErr-
  bar[0] base_addr=38bf80010000 size=4000 is_IO=0 is_prefetchable=1 is_64=1
  bar[2] base_addr=38bf80000000 size=10000 is_IO=0 is_prefetchable=1 is_64=1
  Capabilities: [40] Power Management
  Capabilities: [48] Message Signaled Interrupts
  Capabilities: [60] PCI Express v2 Express Endpoint, MSI 0
    Link capabilities: Max speed 5 GT/s Max width x4
    Negotiated link status: Current speed 5 GT/s Width x4
    Link capabilities2: Not implemented
    DevCap: MaxPayload 512 bytes PhantFunc 0 Latency L0s Maximum of 64 ns L1 No limit
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
    DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq-
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
    DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
    LnkCap: Port # 0 ASPM L0s
            L0s Exit Latency More than 4 μs
            L1 Exit Latency More than 64 μs
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
    LnkCtl: ASPM Disabled RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- ABWMgmt-
    LnkSta: TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
  domain=0000 bus=25 dev=00 func=00 rev=04
    vendor_id=8086 (Intel Corporation) device_id=2030 (Sky Lake-E PCI Express Root Port A)
    iommu_group=51
    driver=pcieport
    physical_slot=2
    control: I/O+ Mem+ BusMaster+ ParErr+ SERR+ DisINTx+
    status: INTx- <ParErr- >TAbort- <TAbort- <MAbort- >SERR- DetParErr-
    Capabilities: [40] Bridge subsystem vendor/device ID
    Capabilities: [60] Message Signaled Interrupts
    Capabilities: [90] PCI Express v2 Root Port, MSI 0
      Link capabilities: Max speed 8 GT/s Max width x16
      Negotiated link status: Current speed 5 GT/s Width x4
      Link capabilities2: Supported link speeds 2.5 GT/s 5.0 GT/s 8.0 GT/s
      DevCap: MaxPayload 256 bytes PhantFunc 0 Latency L0s Maximum of 64 ns L1 Maximum of 1 μs
              ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
      DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq-
              RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
      DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
      LnkCap: Port # 5 ASPM not supported
              L0s Exit Latency 256 ns to less than 512 ns
              L1 Exit Latency 8 μs to less than 16 μs
              ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
      LnkCtl: ASPM Disabled RCB 64 bytes Disabled- CommClk+
              ExtSynch- ClockPM- AutWidDis- BWInt- ABWMgmt-
      LnkSta: TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
      SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise-
              Slot #2 PowerLimit 0.000W Interlock- NoCompl-
    Capabilities: [e0] Power Management

HP Z6 G4 Workstation QuickSpecs contains:

Slot 4: PCI Express Gen3 x8 – CPU with open-ended connector (slot converts to x4 electrical when SSD is installed in 2nd M.2 slot)

Figure 1. HP Z6 G4 Workstation Block Diagram in HP Z6 G4 WORKSTATION contains:

The HP Z6 G4 provides two PCIe3 x16 and three PCIe3 x4 dedicated electrical slots. An additional two PCIe3 x4 buses feed the two on-board M.2 slots. The sixth high-performance I/O slot serves as a PCIe3 x4 or x8 slot depending on whether the second M.2 slot is occupied. A PCIe mux switches in an additional four PCIe lanes when the second M.2 slot is not used.

3.6. U200_dma_stream_crc64

linux@DESKTOP-BVUMP11:~/fpga_sio/software_tests/eclipse_project/bin/release> xilinx_dma_bridge_for_pcie/crc64_stream_latency -tOpening device 0000:31:00.0 (10ee:903f) with IOMMU group 22
Enabled bus master for 0000:31:00.0
Testing design U200_dma_stream_crc64 using C2H 0 -> H2C 0
Current temperature  36.6C  min  33.1C  max  37.6C
     64 len bytes latencies (us):   2.258 (50')   2.379 (75')   2.771 (99')  18.936 (99.999')
    128 len bytes latencies (us):   2.246 (50')   2.252 (75')   2.269 (99')  17.914 (99.999')
    256 len bytes latencies (us):   2.272 (50')   2.276 (75')   2.294 (99')  17.967 (99.999')
    512 len bytes latencies (us):   2.310 (50')   2.317 (75')   2.340 (99')  18.273 (99.999')
   1024 len bytes latencies (us):   2.361 (50')   2.365 (75')   2.390 (99')  18.207 (99.999')
   2048 len bytes latencies (us):   2.467 (50')   2.475 (75')   2.546 (99')  18.294 (99.999')
   4096 len bytes latencies (us):   2.641 (50')   2.652 (75')   2.772 (99')  18.295 (99.999')
   8192 len bytes latencies (us):   3.030 (50')   3.068 (75')   3.275 (99')  18.862 (99.999')
  16384 len bytes latencies (us):   4.095 (50')   4.155 (75')   4.240 (99')  19.465 (99.999')
  32768 len bytes latencies (us):   5.432 (50')   5.498 (75')   5.644 (99')  21.455 (99.999')
  65536 len bytes latencies (us):   7.836 (50')   7.915 (75')   8.194 (99')  23.945 (99.999')
 131072 len bytes latencies (us):  13.214 (50')  13.292 (75')  13.557 (99')  29.985 (99.999')
 262144 len bytes latencies (us):  23.357 (50')  23.429 (75')  23.668 (99')  50.896 (99.999')
 524288 len bytes latencies (us):  44.242 (50')  44.427 (75')  45.030 (99')  64.994 (99.999')
1048576 len bytes latencies (us):  86.262 (50')  86.665 (75')  98.748 (99') 114.093 (99.999')
Current temperature  35.1C  min  32.6C  max  38.1C
Testing design U200_dma_stream_crc64 using C2H 1 -> H2C 1
     64 len bytes latencies (us):   2.227 (50')   2.232 (75')   2.250 (99')  18.154 (99.999')
    128 len bytes latencies (us):   2.244 (50')   2.249 (75')   2.267 (99')  18.509 (99.999')
    256 len bytes latencies (us):   2.274 (50')   2.281 (75')   2.298 (99')  18.146 (99.999')
    512 len bytes latencies (us):   2.294 (50')   2.300 (75')   2.319 (99')  18.211 (99.999')
   1024 len bytes latencies (us):   2.351 (50')   2.359 (75')   2.378 (99')  18.169 (99.999')
   2048 len bytes latencies (us):   2.467 (50')   2.474 (75')   2.549 (99')  18.234 (99.999')
   4096 len bytes latencies (us):   2.637 (50')   2.646 (75')   2.750 (99')  18.489 (99.999')
   8192 len bytes latencies (us):   3.094 (50')   3.110 (75')   3.146 (99')  18.988 (99.999')
  16384 len bytes latencies (us):   4.109 (50')   4.155 (75')   4.211 (99')  20.238 (99.999')
  32768 len bytes latencies (us):   5.475 (50')   5.554 (75')   5.669 (99')  21.316 (99.999')
  65536 len bytes latencies (us):   7.979 (50')   8.067 (75')   8.321 (99')  24.892 (99.999')
 131072 len bytes latencies (us):  13.109 (50')  13.188 (75')  13.561 (99')  42.107 (99.999')
 262144 len bytes latencies (us):  23.395 (50')  23.491 (75')  30.990 (99')  51.283 (99.999')
 524288 len bytes latencies (us):  44.282 (50')  44.626 (75')  56.784 (99')  72.722 (99.999')
1048576 len bytes latencies (us):  86.698 (50')  87.062 (75')  99.166 (99') 113.638 (99.999')
Current temperature  36.6C  min  32.6C  max  38.6C
Testing design U200_dma_stream_crc64 using C2H 2 -> H2C 2
     64 len bytes latencies (us):   2.213 (50')   2.220 (75')   2.237 (99')  17.936 (99.999')
    128 len bytes latencies (us):   2.231 (50')   2.237 (75')   2.255 (99')  17.948 (99.999')
    256 len bytes latencies (us):   2.245 (50')   2.252 (75')   2.269 (99')  18.151 (99.999')
    512 len bytes latencies (us):   2.303 (50')   2.309 (75')   2.326 (99')  18.336 (99.999')
   1024 len bytes latencies (us):   2.340 (50')   2.347 (75')   2.366 (99')  18.182 (99.999')
   2048 len bytes latencies (us):   2.442 (50')   2.448 (75')   2.557 (99')  18.110 (99.999')
   4096 len bytes latencies (us):   2.624 (50')   2.635 (75')   2.743 (99')  18.514 (99.999')
   8192 len bytes latencies (us):   3.084 (50')   3.099 (75')   3.134 (99')  18.733 (99.999')
  16384 len bytes latencies (us):   3.954 (50')   3.997 (75')   4.130 (99')  19.465 (99.999')
  32768 len bytes latencies (us):   5.308 (50')   5.384 (75')   5.590 (99')  21.356 (99.999')
  65536 len bytes latencies (us):   7.861 (50')   7.949 (75')   8.180 (99')  23.978 (99.999')
 131072 len bytes latencies (us):  13.118 (50')  13.217 (75')  13.383 (99')  28.918 (99.999')
 262144 len bytes latencies (us):  23.371 (50')  23.453 (75')  23.726 (99')  43.898 (99.999')
 524288 len bytes latencies (us):  44.241 (50')  44.387 (75')  56.611 (99')  70.332 (99.999')
1048576 len bytes latencies (us):  86.366 (50')  86.661 (75')  96.063 (99') 112.011 (99.999')
Current temperature  35.1C  min  32.6C  max  38.6C
Testing design U200_dma_stream_crc64 using C2H 3 -> H2C 3
     64 len bytes latencies (us):   2.210 (50')   2.216 (75')   2.234 (99')  18.065 (99.999')
    128 len bytes latencies (us):   2.208 (50')   2.215 (75')   2.233 (99')  18.063 (99.999')
    256 len bytes latencies (us):   2.241 (50')   2.247 (75')   2.266 (99')  18.098 (99.999')
    512 len bytes latencies (us):   2.278 (50')   2.284 (75')   2.303 (99')  17.984 (99.999')
   1024 len bytes latencies (us):   2.314 (50')   2.322 (75')   2.343 (99')  18.166 (99.999')
   2048 len bytes latencies (us):   2.405 (50')   2.412 (75')   2.524 (99')  18.038 (99.999')
   4096 len bytes latencies (us):   2.597 (50')   2.608 (75')   2.715 (99')  18.042 (99.999')
   8192 len bytes latencies (us):   3.025 (50')   3.080 (75')   3.126 (99')  18.895 (99.999')
  16384 len bytes latencies (us):   3.964 (50')   4.043 (75')   4.247 (99')  19.862 (99.999')
  32768 len bytes latencies (us):   5.346 (50')   5.432 (75')   5.604 (99')  21.198 (99.999')
  65536 len bytes latencies (us):   8.021 (50')   8.099 (75')   8.195 (99')  23.645 (99.999')
 131072 len bytes latencies (us):  13.150 (50')  13.226 (75')  13.355 (99')  29.114 (99.999')
 262144 len bytes latencies (us):  23.410 (50')  23.499 (75')  23.600 (99')  38.852 (99.999')
 524288 len bytes latencies (us):  44.072 (50')  44.175 (75')  44.491 (99')  64.741 (99.999')
1048576 len bytes latencies (us):  85.543 (50')  85.736 (75')  92.819 (99') 106.628 (99.999')
Current temperature  35.6C  min  32.6C  max  39.1C

4. HP Pavilion 590-p0053na

A HP Pavilion 590-p0053na desktop with a AMD Ryzen 5 2400G CPU.

4.1. VD100_dma_stream_crc64

This has a xcve2302-sfva784-1LP-e-S which is configured as a gen4 x4 PCIe endpoint. However, the CPU only supports gen3 which is why get the "has reduced bandwidth" warning.

For crc64_stream_latency the minimum latency of 6.705 is higher than seen in other tests. Not sure if due to the PC or FPGA:

[mr_halfword@ryzen-alma release]$ xilinx_dma_bridge_for_pcie/crc64_stream_latency 
Opening device 0000:10:00.0 (10ee:b044) with IOMMU group 10
Enabled bus master for 0000:10:00.0
Warning: Device device 0000:10:00.0 (10ee:b044) has reduced bandwidth
         Max width x4 speed 16 GT/s. Negotiated width x4 speed 8 GT/s
Testing design VD100_dma_stream_crc64 using C2H 0 -> H2C 0
     32 len bytes latencies (us):   6.705 (50')   6.985 (75')   8.102 (99')  37.157 (99.999')
     64 len bytes latencies (us):   6.705 (50')   6.985 (75')   8.521 (99')  29.614 (99.999')
    128 len bytes latencies (us):   6.705 (50')   6.985 (75')   8.661 (99')  25.492 (99.999')
    256 len bytes latencies (us):   6.705 (50')   7.054 (75')   8.521 (99')  27.379 (99.999')
    512 len bytes latencies (us):   6.705 (50')   7.054 (75')   8.940 (99')  27.727 (99.999')
   1024 len bytes latencies (us):   7.054 (50')   7.124 (75')   9.079 (99')  26.471 (99.999')
   2048 len bytes latencies (us):   7.194 (50')   7.543 (75')   9.498 (99')  30.940 (99.999')
   4096 len bytes latencies (us):   7.613 (50')   8.032 (75')   9.988 (99')  29.124 (99.999')
   8192 len bytes latencies (us):   7.193 (50')   7.264 (75')   9.499 (99')  24.794 (99.999')
  16384 len bytes latencies (us):   9.289 (50')   9.499 (75')  11.036 (99')  26.470 (99.999')
  32768 len bytes latencies (us):  14.248 (50')  14.388 (75')  17.531 (99')  48.960 (99.999')
  65536 len bytes latencies (us):  23.956 (50')  24.236 (75')  36.458 (99')  47.354 (99.999')
 131072 len bytes latencies (us):  47.773 (50')  47.983 (75')  54.547 (99')  61.462 (99.999')
 262144 len bytes latencies (us):  88.281 (50')  88.422 (75')  98.688 (99') 105.254 (99.999')
 524288 len bytes latencies (us): 166.855 (50') 166.995 (75') 170.836 (99') 182.150 (99.999')
1048576 len bytes latencies (us): 324.001 (50') 324.211 (75') 333.012 (99') 345.653 (99.999')
Testing design VD100_dma_stream_crc64 using C2H 1 -> H2C 1
     32 len bytes latencies (us):   6.705 (50')   6.984 (75')   8.521 (99')  25.283 (99.999')
     64 len bytes latencies (us):   6.705 (50')   6.845 (75')   8.660 (99')  27.728 (99.999')
    128 len bytes latencies (us):   6.774 (50')   7.054 (75')   8.590 (99')  30.032 (99.999')
    256 len bytes latencies (us):   6.705 (50')   6.845 (75')   8.591 (99')  27.099 (99.999')
    512 len bytes latencies (us):   6.705 (50')   7.054 (75')   9.009 (99')  28.636 (99.999')
   1024 len bytes latencies (us):   7.054 (50')   7.124 (75')   9.079 (99')  23.467 (99.999')
   2048 len bytes latencies (us):   7.263 (50')   7.543 (75')   9.429 (99')  33.874 (99.999')
   4096 len bytes latencies (us):   7.613 (50')   8.032 (75')  10.057 (99')  31.918 (99.999')
   8192 len bytes latencies (us):   7.193 (50')   7.264 (75')   9.569 (99')  25.423 (99.999')
  16384 len bytes latencies (us):   9.289 (50')   9.499 (75')  10.826 (99')  26.680 (99.999')
  32768 len bytes latencies (us):  14.248 (50')  14.387 (75')  17.042 (99')  30.522 (99.999')
  65536 len bytes latencies (us):  23.956 (50')  24.236 (75')  36.458 (99')  41.836 (99.999')
 131072 len bytes latencies (us):  47.773 (50')  47.983 (75')  54.338 (99')  63.906 (99.999')
 262144 len bytes latencies (us):  88.281 (50')  88.491 (75')  98.688 (99') 104.625 (99.999')
 524288 len bytes latencies (us): 166.855 (50') 166.995 (75') 170.906 (99') 182.849 (99.999')
1048576 len bytes latencies (us): 323.932 (50') 324.142 (75') 332.034 (99') 346.561 (99.999')
Testing design VD100_dma_stream_crc64 using C2H 2 -> H2C 2
     32 len bytes latencies (us):   6.705 (50')   6.844 (75')   8.591 (99')  28.636 (99.999')
     64 len bytes latencies (us):   6.705 (50')   7.054 (75')   8.660 (99')  28.635 (99.999')
    128 len bytes latencies (us):   6.705 (50')   6.984 (75')   8.660 (99')  25.842 (99.999')
    256 len bytes latencies (us):   6.705 (50')   7.054 (75')   9.079 (99')  33.595 (99.999')
    512 len bytes latencies (us):   6.705 (50')   7.054 (75')   8.730 (99')  37.227 (99.999')
   1024 len bytes latencies (us):   6.984 (50')   7.124 (75')   9.079 (99')  37.366 (99.999')
   2048 len bytes latencies (us):   7.194 (50')   7.473 (75')   9.499 (99')  34.852 (99.999')
   4096 len bytes latencies (us):   7.613 (50')   8.032 (75')  10.127 (99')  29.614 (99.999')
   8192 len bytes latencies (us):   7.124 (50')   7.264 (75')   9.429 (99')  20.394 (99.999')
  16384 len bytes latencies (us):   9.219 (50')   9.499 (75')  16.064 (99')  25.353 (99.999')
  32768 len bytes latencies (us):  14.248 (50')  14.387 (75')  17.531 (99')  30.522 (99.999')
  65536 len bytes latencies (us):  23.956 (50')  24.235 (75')  36.668 (99')  43.023 (99.999')
 131072 len bytes latencies (us):  47.773 (50')  47.982 (75')  54.826 (99')  64.046 (99.999')
 262144 len bytes latencies (us):  88.212 (50')  88.421 (75')  98.688 (99') 104.625 (99.999')
 524288 len bytes latencies (us): 166.855 (50') 166.995 (75') 170.836 (99') 183.687 (99.999')
1048576 len bytes latencies (us): 324.001 (50') 324.142 (75') 332.593 (99') 345.863 (99.999')
Testing design VD100_dma_stream_crc64 using C2H 3 -> H2C 3
     32 len bytes latencies (us):   6.705 (50')   6.845 (75')   8.590 (99')  31.010 (99.999')
     64 len bytes latencies (us):   6.705 (50')   6.985 (75')   8.661 (99')  30.731 (99.999')
    128 len bytes latencies (us):   6.705 (50')   6.984 (75')   8.660 (99')  30.521 (99.999')
    256 len bytes latencies (us):   6.705 (50')   6.984 (75')   8.660 (99')  31.011 (99.999')
    512 len bytes latencies (us):   6.705 (50')   7.054 (75')   8.870 (99')  33.315 (99.999')
   1024 len bytes latencies (us):   7.054 (50')   7.124 (75')   9.079 (99')  36.738 (99.999')
   2048 len bytes latencies (us):   7.194 (50')   7.473 (75')   9.499 (99')  34.293 (99.999')
   4096 len bytes latencies (us):   7.613 (50')   8.032 (75')  10.057 (99')  27.099 (99.999')
   8192 len bytes latencies (us):   7.193 (50')   7.264 (75')   9.499 (99')  24.724 (99.999')
  16384 len bytes latencies (us):   9.220 (50')   9.499 (75')  15.645 (99')  35.270 (99.999')
  32768 len bytes latencies (us):  14.248 (50')  14.387 (75')  17.111 (99')  31.010 (99.999')
  65536 len bytes latencies (us):  23.956 (50')  24.236 (75')  36.319 (99')  46.655 (99.999')
 131072 len bytes latencies (us):  49.029 (50')  49.169 (75')  49.659 (99')  62.230 (99.999')
 262144 len bytes latencies (us):  88.212 (50')  88.421 (75')  98.688 (99') 107.698 (99.999')
 524288 len bytes latencies (us): 166.855 (50') 166.995 (75') 170.836 (99') 183.199 (99.999')
1048576 len bytes latencies (us): 323.932 (50') 324.142 (75') 333.431 (99') 342.021 (99.999')

test_dma_bridge_independent_streams gets 3336 Mbytes/sec in the host-to-card direction, which is 85% of the thoretical bandwidth of a gen3x4 link:

[mr_halfword@ryzen-alma release]$ xilinx_dma_bridge_for_pcie/test_dma_bridge_independent_streams 
Opening device 0000:10:00.0 (10ee:b044) with IOMMU group 10
Enabled bus master for 0000:10:00.0
Warning: Device device 0000:10:00.0 (10ee:b044) has reduced bandwidth
         Max width x4 speed 16 GT/s. Negotiated width x4 speed 8 GT/s
Using num_descriptors=64 bytes_per_buffer=0x1000000 data_mapping_size_words=0x10000000
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:10:00.0 IOMMU group 10 H2C channel 0
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:10:00.0 IOMMU group 10 H2C channel 1
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:10:00.0 IOMMU group 10 H2C channel 2
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:10:00.0 IOMMU group 10 H2C channel 3
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:10:00.0 IOMMU group 10 C2H channel 0
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:10:00.0 IOMMU group 10 C2H channel 1
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:10:00.0 IOMMU group 10 C2H channel 2
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:10:00.0 IOMMU group 10 C2H channel 3
Press Ctrl-C to stop test
  0000:10:00.0 H2C channel 0 833.999 Mbytes/sec (8338276352 bytes in 497 transfers over 9.997946 secs)
  0000:10:00.0 H2C channel 1 833.996 Mbytes/sec (8338276352 bytes in 497 transfers over 9.997979 secs)
  0000:10:00.0 H2C channel 2 833.992 Mbytes/sec (8338276352 bytes in 497 transfers over 9.998031 secs)
  0000:10:00.0 H2C channel 3 833.992 Mbytes/sec (8338276352 bytes in 497 transfers over 9.998031 secs)
  0000:10:00.0 C2H channel 0 0.000 Mbytes/sec (3976 bytes in 497 transfers over 9.997930 secs)
  0000:10:00.0 C2H channel 1 0.000 Mbytes/sec (3976 bytes in 497 transfers over 9.997966 secs)
  0000:10:00.0 C2H channel 2 0.000 Mbytes/sec (3976 bytes in 497 transfers over 9.997969 secs)
  0000:10:00.0 C2H channel 3 0.000 Mbytes/sec (3976 bytes in 497 transfers over 9.997970 secs)

  0000:10:00.0 H2C channel 0 834.033 Mbytes/sec (8338276352 bytes in 497 transfers over 9.997538 secs)
  0000:10:00.0 H2C channel 1 834.033 Mbytes/sec (8338276352 bytes in 497 transfers over 9.997538 secs)
  0000:10:00.0 H2C channel 2 834.033 Mbytes/sec (8338276352 bytes in 497 transfers over 9.997538 secs)
  0000:10:00.0 H2C channel 3 834.033 Mbytes/sec (8338276352 bytes in 497 transfers over 9.997538 secs)
  0000:10:00.0 C2H channel 0 0.000 Mbytes/sec (3976 bytes in 497 transfers over 9.997538 secs)
  0000:10:00.0 C2H channel 1 0.000 Mbytes/sec (3976 bytes in 497 transfers over 9.997538 secs)
  0000:10:00.0 C2H channel 2 0.000 Mbytes/sec (3976 bytes in 497 transfers over 9.997538 secs)
  0000:10:00.0 C2H channel 3 0.000 Mbytes/sec (3976 bytes in 497 transfers over 9.997538 secs)

  0000:10:00.0 H2C channel 0 834.032 Mbytes/sec (8338276352 bytes in 497 transfers over 9.997549 secs)
  0000:10:00.0 H2C channel 1 834.031 Mbytes/sec (8338276352 bytes in 497 transfers over 9.997558 secs)
  0000:10:00.0 H2C channel 2 834.035 Mbytes/sec (8338276352 bytes in 497 transfers over 9.997510 secs)
  0000:10:00.0 H2C channel 3 834.035 Mbytes/sec (8338276352 bytes in 497 transfers over 9.997510 secs)
  0000:10:00.0 C2H channel 0 0.000 Mbytes/sec (3976 bytes in 497 transfers over 9.997550 secs)
  0000:10:00.0 C2H channel 1 0.000 Mbytes/sec (3976 bytes in 497 transfers over 9.997518 secs)
  0000:10:00.0 C2H channel 2 0.000 Mbytes/sec (3976 bytes in 497 transfers over 9.997566 secs)
  0000:10:00.0 C2H channel 3 0.000 Mbytes/sec (3976 bytes in 497 transfers over 9.997566 secs)

^C  0000:10:00.0 H2C channel 0 834.035 Mbytes/sec (3271557120 bytes in 195 transfers over 3.922563 secs)
  0000:10:00.0 H2C channel 1 834.042 Mbytes/sec (3271557120 bytes in 195 transfers over 3.922534 secs)
  0000:10:00.0 H2C channel 2 834.043 Mbytes/sec (3271557120 bytes in 195 transfers over 3.922526 secs)
  0000:10:00.0 H2C channel 3 834.045 Mbytes/sec (3271557120 bytes in 195 transfers over 3.922518 secs)
  0000:10:00.0 C2H channel 0 0.000 Mbytes/sec (1560 bytes in 195 transfers over 3.922531 secs)
  0000:10:00.0 C2H channel 1 0.000 Mbytes/sec (1560 bytes in 195 transfers over 3.922544 secs)
  0000:10:00.0 C2H channel 2 0.000 Mbytes/sec (1560 bytes in 195 transfers over 3.922504 secs)
  0000:10:00.0 C2H channel 3 0.000 Mbytes/sec (1560 bytes in 195 transfers over 3.922497 secs)

Overall test statistics:
  0000:10:00.0 H2C channel 0 834.023 Mbytes/sec (28286386176 bytes in 1686 transfers over 33.915597 secs)
  0000:10:00.0 H2C channel 1 834.023 Mbytes/sec (28286386176 bytes in 1686 transfers over 33.915608 secs)
  0000:10:00.0 H2C channel 2 834.023 Mbytes/sec (28286386176 bytes in 1686 transfers over 33.915604 secs)
  0000:10:00.0 H2C channel 3 834.023 Mbytes/sec (28286386176 bytes in 1686 transfers over 33.915597 secs)
  0000:10:00.0 C2H channel 0 0.000 Mbytes/sec (13488 bytes in 1686 transfers over 33.915549 secs)
  0000:10:00.0 C2H channel 1 0.000 Mbytes/sec (13488 bytes in 1686 transfers over 33.915566 secs)
  0000:10:00.0 C2H channel 2 0.000 Mbytes/sec (13488 bytes in 1686 transfers over 33.915577 secs)
  0000:10:00.0 C2H channel 3 0.000 Mbytes/sec (13488 bytes in 1686 transfers over 33.915570 secs)


Overall PASS

4.2. VD100_dma_stream_loopback

[mr_halfword@ryzen-alma release]$ xilinx_dma_bridge_for_pcie/test_dma_bridge_parallel_streams 
Opening device 0000:10:00.0 (10ee:b044) with IOMMU group 10
Enabled bus master for 0000:10:00.0
Warning: Device device 0000:10:00.0 (10ee:b044) has reduced bandwidth
         Max width x4 speed 16 GT/s. Negotiated width x4 speed 8 GT/s
Using num_descriptors=64 bytes_per_buffer=0x1000000 data_mapping_size_words=0x10000000
Device 0000:10:00.0 design VD100_dma_stream_loopback routes updated
Selecting test of VD100_dma_stream_loopback design PCI device 0000:10:00.0 IOMMU group 10 H2C channel 1 C2H channel 0
Selecting test of VD100_dma_stream_loopback design PCI device 0000:10:00.0 IOMMU group 10 H2C channel 0 C2H channel 1
Selecting test of VD100_dma_stream_loopback design PCI device 0000:10:00.0 IOMMU group 10 H2C channel 3 C2H channel 2
Selecting test of VD100_dma_stream_loopback design PCI device 0000:10:00.0 IOMMU group 10 H2C channel 2 C2H channel 3
Press Ctrl-C to stop test
  0000:10:00.0 1 -> 0 825.821 Mbytes/sec (8254390272 bytes in 9.995373 secs)
  0000:10:00.0 0 -> 1 825.816 Mbytes/sec (8254390272 bytes in 9.995432 secs)
  0000:10:00.0 3 -> 2 825.815 Mbytes/sec (8254390272 bytes in 9.995444 secs)
  0000:10:00.0 2 -> 3 825.814 Mbytes/sec (8254390272 bytes in 9.995458 secs)

  0000:10:00.0 1 -> 0 826.176 Mbytes/sec (8254390272 bytes in 9.991079 secs)
  0000:10:00.0 0 -> 1 826.178 Mbytes/sec (8254390272 bytes in 9.991055 secs)
  0000:10:00.0 3 -> 2 826.178 Mbytes/sec (8254390272 bytes in 9.991055 secs)
  0000:10:00.0 2 -> 3 826.178 Mbytes/sec (8254390272 bytes in 9.991055 secs)
<snip>>
  0000:10:00.0 1 -> 0 826.176 Mbytes/sec (8271167488 bytes in 10.011391 secs)
  0000:10:00.0 0 -> 1 826.174 Mbytes/sec (8271167488 bytes in 10.011415 secs)
  0000:10:00.0 3 -> 2 826.174 Mbytes/sec (8271167488 bytes in 10.011416 secs)
  0000:10:00.0 2 -> 3 826.174 Mbytes/sec (8271167488 bytes in 10.011416 secs)

  0000:10:00.0 1 -> 0 826.175 Mbytes/sec (8254390272 bytes in 9.991098 secs)
  0000:10:00.0 0 -> 1 826.175 Mbytes/sec (8254390272 bytes in 9.991098 secs)
  0000:10:00.0 3 -> 2 826.174 Mbytes/sec (8254390272 bytes in 9.991098 secs)
  0000:10:00.0 2 -> 3 826.175 Mbytes/sec (8254390272 bytes in 9.991098 secs)

^C  0000:10:00.0 1 -> 0 826.173 Mbytes/sec (5351931904 bytes in 6.477982 secs)
  0000:10:00.0 0 -> 1 826.175 Mbytes/sec (5351931904 bytes in 6.477964 secs)
  0000:10:00.0 3 -> 2 826.177 Mbytes/sec (5351931904 bytes in 6.477947 secs)
  0000:10:00.0 2 -> 3 826.179 Mbytes/sec (5351931904 bytes in 6.477932 secs)

Overall test statistics:
  0000:10:00.0 1 -> 0 826.171 Mbytes/sec (2459070103552 bytes in 2976.465101 secs)
  0000:10:00.0 0 -> 1 826.171 Mbytes/sec (2459070103552 bytes in 2976.465117 secs)
  0000:10:00.0 3 -> 2 826.171 Mbytes/sec (2459070103552 bytes in 2976.465113 secs)
  0000:10:00.0 2 -> 3 826.171 Mbytes/sec (2459070103552 bytes in 2976.465111 secs)

0000:10:00.0 1 -> 0 Test pattern verified in 268435456 words
0000:10:00.0 0 -> 1 Test pattern verified in 268435456 words
0000:10:00.0 3 -> 2 Test pattern verified in 268435456 words
0000:10:00.0 2 -> 3 Test pattern verified in 268435456 words

Overall PASS

Sustains 3304.7 Mbytes/sec in both directions, which is 84% of the thoretical bandwidth of a gen3x4 link.

4.3. VD100_dma_ddr4

$ xilinx_dma_bridge_for_pcie/test_dma_bridge
Opening device 0000:10:00.0 (10ee:b044) with IOMMU group 10
Enabled bus master for 0000:10:00.0
Warning: Device device 0000:10:00.0 (10ee:b044) has reduced bandwidth
         Max width x4 speed 16 GT/s. Negotiated width x4 speed 8 GT/s
Testing VD100_dma_ddr4 design with memory base address 0x800000000 size 0x100000000
PCI device 0000:10:00.0 IOMMU group 10

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 0 C2H channel 0
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3792.843521 (Mbytes/sec)
  Mean = 3795.284436 (Mbytes/sec)
   Max = 3798.623933 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3209.702606 (Mbytes/sec)
  Mean = 3210.850180 (Mbytes/sec)
   Max = 3211.897410 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3805.251595 (Mbytes/sec)
  Mean = 3805.964015 (Mbytes/sec)
   Max = 3806.727366 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 0 C2H channel 1
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3797.601147 (Mbytes/sec)
  Mean = 3798.436931 (Mbytes/sec)
   Max = 3799.014192 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3210.308175 (Mbytes/sec)
  Mean = 3210.960078 (Mbytes/sec)
   Max = 3212.336832 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3804.486481 (Mbytes/sec)
  Mean = 3805.370628 (Mbytes/sec)
   Max = 3806.314552 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 0 C2H channel 2
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3796.330936 (Mbytes/sec)
  Mean = 3798.506732 (Mbytes/sec)
   Max = 3800.201418 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3209.583665 (Mbytes/sec)
  Mean = 3210.059320 (Mbytes/sec)
   Max = 3210.522879 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3806.088864 (Mbytes/sec)
  Mean = 3806.487962 (Mbytes/sec)
   Max = 3806.677647 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 0 C2H channel 3
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3785.298524 (Mbytes/sec)
  Mean = 3788.036497 (Mbytes/sec)
   Max = 3790.524320 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3210.710283 (Mbytes/sec)
  Mean = 3210.842845 (Mbytes/sec)
   Max = 3211.006688 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3804.315376 (Mbytes/sec)
  Mean = 3805.938636 (Mbytes/sec)
   Max = 3806.781566 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 1 C2H channel 0
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3785.801413 (Mbytes/sec)
  Mean = 3793.114378 (Mbytes/sec)
   Max = 3800.032574 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3210.856801 (Mbytes/sec)
  Mean = 3211.429389 (Mbytes/sec)
   Max = 3212.044375 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3800.623006 (Mbytes/sec)
  Mean = 3804.581164 (Mbytes/sec)
   Max = 3808.187067 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 1 C2H channel 1
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3790.134636 (Mbytes/sec)
  Mean = 3793.991200 (Mbytes/sec)
   Max = 3797.340846 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3209.687864 (Mbytes/sec)
  Mean = 3210.624162 (Mbytes/sec)
   Max = 3211.204379 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3807.148047 (Mbytes/sec)
  Mean = 3807.428849 (Mbytes/sec)
   Max = 3808.048640 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 1 C2H channel 2
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3795.683264 (Mbytes/sec)
  Mean = 3797.022379 (Mbytes/sec)
   Max = 3798.449831 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3209.994639 (Mbytes/sec)
  Mean = 3210.408609 (Mbytes/sec)
   Max = 3210.787731 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3806.677647 (Mbytes/sec)
  Mean = 3807.322006 (Mbytes/sec)
   Max = 3808.063968 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 1 C2H channel 3
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3795.339133 (Mbytes/sec)
  Mean = 3796.666752 (Mbytes/sec)
   Max = 3798.211467 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3210.034853 (Mbytes/sec)
  Mean = 3211.168620 (Mbytes/sec)
   Max = 3212.260817 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3805.793718 (Mbytes/sec)
  Mean = 3806.912421 (Mbytes/sec)
   Max = 3807.435154 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 2 C2H channel 0
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3794.102502 (Mbytes/sec)
  Mean = 3796.576802 (Mbytes/sec)
   Max = 3797.971724 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3210.618253 (Mbytes/sec)
  Mean = 3210.810155 (Mbytes/sec)
   Max = 3211.008868 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3805.140926 (Mbytes/sec)
  Mean = 3806.642653 (Mbytes/sec)
   Max = 3807.479472 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 2 C2H channel 1
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3777.184346 (Mbytes/sec)
  Mean = 3789.098669 (Mbytes/sec)
   Max = 3797.100980 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3208.941699 (Mbytes/sec)
  Mean = 3209.730922 (Mbytes/sec)
   Max = 3210.233934 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3758.317432 (Mbytes/sec)
  Mean = 3788.455504 (Mbytes/sec)
   Max = 3806.892799 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 2 C2H channel 2
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3797.680179 (Mbytes/sec)
  Mean = 3797.906751 (Mbytes/sec)
   Max = 3798.308828 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3210.890500 (Mbytes/sec)
  Mean = 3211.567714 (Mbytes/sec)
   Max = 3212.666601 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3810.932224 (Mbytes/sec)
  Mean = 3811.683218 (Mbytes/sec)
   Max = 3812.360883 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 2 C2H channel 3
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3796.118613 (Mbytes/sec)
  Mean = 3797.702053 (Mbytes/sec)
   Max = 3798.462267 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3210.278847 (Mbytes/sec)
  Mean = 3211.007067 (Mbytes/sec)
   Max = 3212.139840 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3803.928498 (Mbytes/sec)
  Mean = 3809.623404 (Mbytes/sec)
   Max = 3812.225694 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 3 C2H channel 0
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3796.088151 (Mbytes/sec)
  Mean = 3799.320907 (Mbytes/sec)
   Max = 3801.272600 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3210.432199 (Mbytes/sec)
  Mean = 3210.916192 (Mbytes/sec)
   Max = 3211.444858 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3810.980640 (Mbytes/sec)
  Mean = 3811.476204 (Mbytes/sec)
   Max = 3811.849022 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 3 C2H channel 1
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3796.102444 (Mbytes/sec)
  Mean = 3798.035529 (Mbytes/sec)
   Max = 3801.773867 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3210.049598 (Mbytes/sec)
  Mean = 3210.824446 (Mbytes/sec)
   Max = 3211.929788 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3811.405093 (Mbytes/sec)
  Mean = 3811.724090 (Mbytes/sec)
   Max = 3811.997650 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 3 C2H channel 2
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3797.671737 (Mbytes/sec)
  Mean = 3798.936509 (Mbytes/sec)
   Max = 3801.274010 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3209.839652 (Mbytes/sec)
  Mean = 3211.024376 (Mbytes/sec)
   Max = 3211.952603 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3812.086265 (Mbytes/sec)
  Mean = 3812.428360 (Mbytes/sec)
   Max = 3812.645700 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 3 C2H channel 3
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3794.587882 (Mbytes/sec)
  Mean = 3796.389377 (Mbytes/sec)
   Max = 3797.520004 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3209.672433 (Mbytes/sec)
  Mean = 3210.381406 (Mbytes/sec)
   Max = 3211.512282 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3808.858807 (Mbytes/sec)
  Mean = 3811.424781 (Mbytes/sec)
   Max = 3812.636535 (Mbytes/sec)
TEST PASS
Testing VD100_dma_ddr4 design with memory base address 0x800000000 size 0x100000000
PCI device 0000:10:00.0 IOMMU group 10

Testing using:
  H2C channel 0 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 0 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3799.202944 (Mbytes/sec)
  Mean = 3800.101193 (Mbytes/sec)
   Max = 3801.253619 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3125.885345 (Mbytes/sec)
  Mean = 3126.928998 (Mbytes/sec)
   Max = 3127.954090 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3812.849999 (Mbytes/sec)
  Mean = 3812.936114 (Mbytes/sec)
   Max = 3813.105575 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 0 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 1 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3799.852292 (Mbytes/sec)
  Mean = 3800.610374 (Mbytes/sec)
   Max = 3801.262312 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3126.425999 (Mbytes/sec)
  Mean = 3127.247906 (Mbytes/sec)
   Max = 3128.153304 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3811.944299 (Mbytes/sec)
  Mean = 3812.334403 (Mbytes/sec)
   Max = 3812.570820 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 0 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 2 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3797.159174 (Mbytes/sec)
  Mean = 3799.723273 (Mbytes/sec)
   Max = 3801.134022 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3125.897580 (Mbytes/sec)
  Mean = 3126.585552 (Mbytes/sec)
   Max = 3127.444401 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3812.084665 (Mbytes/sec)
  Mean = 3812.381674 (Mbytes/sec)
   Max = 3812.942434 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 0 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 3 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3797.244290 (Mbytes/sec)
  Mean = 3799.404759 (Mbytes/sec)
   Max = 3800.558699 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3126.015484 (Mbytes/sec)
  Mean = 3126.808518 (Mbytes/sec)
   Max = 3127.309848 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3812.059614 (Mbytes/sec)
  Mean = 3812.744032 (Mbytes/sec)
   Max = 3813.080274 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 1 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 0 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3798.854421 (Mbytes/sec)
  Mean = 3799.960713 (Mbytes/sec)
   Max = 3801.516334 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3126.218743 (Mbytes/sec)
  Mean = 3126.625097 (Mbytes/sec)
   Max = 3127.162584 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3812.543403 (Mbytes/sec)
  Mean = 3812.657867 (Mbytes/sec)
   Max = 3812.785694 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 1 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 1 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3790.931657 (Mbytes/sec)
  Mean = 3797.148331 (Mbytes/sec)
   Max = 3801.399308 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3125.611752 (Mbytes/sec)
  Mean = 3126.394527 (Mbytes/sec)
   Max = 3127.071625 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3809.237324 (Mbytes/sec)
  Mean = 3811.101026 (Mbytes/sec)
   Max = 3813.098717 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 1 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 2 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3799.488854 (Mbytes/sec)
  Mean = 3800.586943 (Mbytes/sec)
   Max = 3801.426332 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3125.901393 (Mbytes/sec)
  Mean = 3126.572199 (Mbytes/sec)
   Max = 3127.110107 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3812.485258 (Mbytes/sec)
  Mean = 3812.783335 (Mbytes/sec)
   Max = 3813.056868 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 1 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 3 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3799.068690 (Mbytes/sec)
  Mean = 3800.513601 (Mbytes/sec)
   Max = 3802.467638 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3126.031375 (Mbytes/sec)
  Mean = 3126.533889 (Mbytes/sec)
   Max = 3127.364718 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3811.676121 (Mbytes/sec)
  Mean = 3812.654148 (Mbytes/sec)
   Max = 3813.126382 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 2 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 0 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3796.901981 (Mbytes/sec)
  Mean = 3797.511203 (Mbytes/sec)
   Max = 3798.075927 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3126.384513 (Mbytes/sec)
  Mean = 3127.159963 (Mbytes/sec)
   Max = 3127.769858 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3812.967264 (Mbytes/sec)
  Mean = 3813.230006 (Mbytes/sec)
   Max = 3813.541382 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 2 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 1 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3797.700401 (Mbytes/sec)
  Mean = 3798.176270 (Mbytes/sec)
   Max = 3799.112816 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3126.259427 (Mbytes/sec)
  Mean = 3126.786936 (Mbytes/sec)
   Max = 3127.468261 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3812.696813 (Mbytes/sec)
  Mean = 3813.019394 (Mbytes/sec)
   Max = 3813.233727 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 2 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 2 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3797.527324 (Mbytes/sec)
  Mean = 3798.477219 (Mbytes/sec)
   Max = 3799.801811 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3126.120209 (Mbytes/sec)
  Mean = 3126.596562 (Mbytes/sec)
   Max = 3127.595828 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3813.092332 (Mbytes/sec)
  Mean = 3813.192055 (Mbytes/sec)
   Max = 3813.330204 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 2 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 3 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3797.591346 (Mbytes/sec)
  Mean = 3798.195859 (Mbytes/sec)
   Max = 3799.012834 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3126.129745 (Mbytes/sec)
  Mean = 3126.912461 (Mbytes/sec)
   Max = 3127.560675 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3813.039373 (Mbytes/sec)
  Mean = 3813.377502 (Mbytes/sec)
   Max = 3813.705989 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 3 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 0 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3785.906344 (Mbytes/sec)
  Mean = 3795.519675 (Mbytes/sec)
   Max = 3799.916624 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3125.688488 (Mbytes/sec)
  Mean = 3126.616671 (Mbytes/sec)
   Max = 3127.311440 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3802.625648 (Mbytes/sec)
  Mean = 3809.662102 (Mbytes/sec)
   Max = 3812.965371 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 3 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 1 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3784.099913 (Mbytes/sec)
  Mean = 3791.566488 (Mbytes/sec)
   Max = 3799.547309 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3126.277229 (Mbytes/sec)
  Mean = 3126.856295 (Mbytes/sec)
   Max = 3127.251483 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3789.541199 (Mbytes/sec)
  Mean = 3804.240408 (Mbytes/sec)
   Max = 3812.312956 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 3 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 2 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3773.815424 (Mbytes/sec)
  Mean = 3790.941126 (Mbytes/sec)
   Max = 3802.654572 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3126.107813 (Mbytes/sec)
  Mean = 3126.824895 (Mbytes/sec)
   Max = 3127.562743 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3812.522837 (Mbytes/sec)
  Mean = 3812.908689 (Mbytes/sec)
   Max = 3813.088314 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 3 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 3 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3797.545650 (Mbytes/sec)
  Mean = 3798.124974 (Mbytes/sec)
   Max = 3799.327588 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 3125.784133 (Mbytes/sec)
  Mean = 3126.217842 (Mbytes/sec)
   Max = 3126.812493 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3808.070437 (Mbytes/sec)
  Mean = 3811.623655 (Mbytes/sec)
   Max = 3813.411554 (Mbytes/sec)
TEST PASS

Overall PASS

5. Dell Optiplex XE4

A Dell Optiplex XE4 with:

  • 12th Gen Intel(R) Core(TM) i3-12100 CPU
  • openSUSE Leap 15.5 with Kernel 5.14.21-150500.55.62-default
  • pcie_port_pm=off added to command line to avoid the issue in Loading vfio-pci seems to remove power

5.1. VD100_dma_stream_crc64

linux@DESKTOP-OQMPARM:~/fpga_sio/software_tests/eclipse_project/bin/release> xilinx_dma_bridge_for_pcie/crc64_stream_latency 
Opening device 0000:01:00.0 (10ee:b044) with IOMMU group 12
Enabled bus master for 0000:01:00.0
Testing design VD100_dma_stream_crc64 using C2H 0 -> H2C 0
     32 len bytes latencies (us):   2.341 (50')   2.494 (75')   3.223 (99')  15.562 (99.999')
     64 len bytes latencies (us):   2.335 (50')   2.354 (75')   2.586 (99')  15.008 (99.999')
    128 len bytes latencies (us):   2.365 (50')   2.384 (75')   2.585 (99')  83.842 (99.999')
    256 len bytes latencies (us):   2.415 (50')   2.427 (75')   2.658 (99')  13.656 (99.999')
    512 len bytes latencies (us):   2.485 (50')   2.498 (75')   2.779 (99')  13.852 (99.999')
   1024 len bytes latencies (us):   2.617 (50')   2.640 (75')   2.852 (99')  15.377 (99.999')
   2048 len bytes latencies (us):   2.781 (50')   2.805 (75')   3.012 (99')  13.849 (99.999')
   4096 len bytes latencies (us):   3.133 (50')   3.162 (75')   3.377 (99')   5.275 (99.999')
   8192 len bytes latencies (us):   3.745 (50')   3.826 (75')   4.004 (99')   6.263 (99.999')
  16384 len bytes latencies (us):   5.057 (50')   5.121 (75')   5.357 (99')  18.374 (99.999')
  32768 len bytes latencies (us):   7.439 (50')   7.485 (75')   7.672 (99')  20.524 (99.999')
  65536 len bytes latencies (us):  12.337 (50')  12.382 (75')  12.569 (99')  91.932 (99.999')
 131072 len bytes latencies (us):  22.136 (50')  22.186 (75')  22.336 (99')  83.607 (99.999')
 262144 len bytes latencies (us):  41.697 (50')  41.756 (75')  41.980 (99') 108.660 (99.999')
 524288 len bytes latencies (us):  80.954 (50')  81.025 (75')  81.273 (99') 150.380 (99.999')
1048576 len bytes latencies (us): 160.554 (50') 160.699 (75') 161.090 (99') 219.537 (99.999')
Testing design VD100_dma_stream_crc64 using C2H 1 -> H2C 1
     32 len bytes latencies (us):   2.323 (50')   2.333 (75')   2.540 (99')  15.598 (99.999')
     64 len bytes latencies (us):   2.340 (50')   2.376 (75')   2.623 (99')  14.188 (99.999')
    128 len bytes latencies (us):   2.380 (50')   2.543 (75')   2.695 (99')  83.693 (99.999')
    256 len bytes latencies (us):   2.423 (50')   2.508 (75')   2.711 (99')  14.447 (99.999')
    512 len bytes latencies (us):   2.488 (50')   2.537 (75')   2.794 (99')  14.713 (99.999')
   1024 len bytes latencies (us):   2.598 (50')   2.680 (75')   2.904 (99')  15.858 (99.999')
   2048 len bytes latencies (us):   2.782 (50')   2.806 (75')   3.007 (99')  15.048 (99.999')
   4096 len bytes latencies (us):   3.138 (50')   3.186 (75')   3.366 (99')   7.072 (99.999')
   8192 len bytes latencies (us):   3.800 (50')   3.853 (75')   4.053 (99')  14.547 (99.999')
  16384 len bytes latencies (us):   5.042 (50')   5.090 (75')   5.325 (99')  83.699 (99.999')
  32768 len bytes latencies (us):   7.450 (50')   7.495 (75')   7.667 (99')  18.695 (99.999')
  65536 len bytes latencies (us):  12.320 (50')  12.359 (75')  12.501 (99')  88.262 (99.999')
 131072 len bytes latencies (us):  22.088 (50')  22.132 (75')  22.285 (99')  87.603 (99.999')
 262144 len bytes latencies (us):  41.718 (50')  41.775 (75')  41.997 (99') 108.486 (99.999')
 524288 len bytes latencies (us):  80.986 (50')  81.060 (75')  81.332 (99') 150.506 (99.999')
1048576 len bytes latencies (us): 160.581 (50') 160.723 (75') 161.130 (99') 238.312 (99.999')
Testing design VD100_dma_stream_crc64 using C2H 2 -> H2C 2
     32 len bytes latencies (us):   2.320 (50')   2.370 (75')   2.604 (99')  13.896 (99.999')
     64 len bytes latencies (us):   2.319 (50')   2.383 (75')   2.583 (99')  15.378 (99.999')
    128 len bytes latencies (us):   2.366 (50')   2.391 (75')   2.604 (99')  15.009 (99.999')
    256 len bytes latencies (us):   2.417 (50')   2.442 (75')   2.768 (99')  82.669 (99.999')
    512 len bytes latencies (us):   2.636 (50')   2.652 (75')   2.827 (99')  13.759 (99.999')
   1024 len bytes latencies (us):   2.606 (50')   2.627 (75')   2.847 (99')  14.387 (99.999')
   2048 len bytes latencies (us):   2.751 (50')   2.830 (75')   2.982 (99')  13.750 (99.999')
   4096 len bytes latencies (us):   3.134 (50')   3.227 (75')   3.377 (99')  14.904 (99.999')
   8192 len bytes latencies (us):   3.768 (50')   3.843 (75')   4.065 (99')  16.656 (99.999')
  16384 len bytes latencies (us):   4.992 (50')   5.056 (75')   5.463 (99')  84.652 (99.999')
  32768 len bytes latencies (us):   7.423 (50')   7.464 (75')   7.650 (99')  17.766 (99.999')
  65536 len bytes latencies (us):  12.298 (50')  12.337 (75')  12.515 (99')  88.042 (99.999')
 131072 len bytes latencies (us):  22.110 (50')  22.156 (75')  22.313 (99')  96.961 (99.999')
 262144 len bytes latencies (us):  41.754 (50')  41.809 (75')  41.993 (99') 108.360 (99.999')
 524288 len bytes latencies (us):  80.876 (50')  80.948 (75')  81.186 (99') 150.463 (99.999')
1048576 len bytes latencies (us): 160.323 (50') 160.445 (75') 160.843 (99') 238.303 (99.999')
Testing design VD100_dma_stream_crc64 using C2H 3 -> H2C 3
     32 len bytes latencies (us):   2.305 (50')   2.321 (75')   2.608 (99')   6.281 (99.999')
     64 len bytes latencies (us):   2.318 (50')   2.332 (75')   2.613 (99')  15.238 (99.999')
    128 len bytes latencies (us):   2.343 (50')   2.357 (75')   2.662 (99')  14.709 (99.999')
    256 len bytes latencies (us):   2.396 (50')   2.412 (75')   2.680 (99')  14.345 (99.999')
    512 len bytes latencies (us):   2.465 (50')   2.477 (75')   2.790 (99')  83.778 (99.999')
   1024 len bytes latencies (us):   2.558 (50')   2.577 (75')   2.813 (99')  14.825 (99.999')
   2048 len bytes latencies (us):   2.734 (50')   2.828 (75')   2.995 (99')   6.589 (99.999')
   4096 len bytes latencies (us):   3.127 (50')   3.164 (75')   3.379 (99')  14.789 (99.999')
   8192 len bytes latencies (us):   3.739 (50')   3.779 (75')   4.022 (99')   6.077 (99.999')
  16384 len bytes latencies (us):   5.044 (50')   5.143 (75')   5.347 (99')  83.970 (99.999')
  32768 len bytes latencies (us):   7.395 (50')   7.486 (75')   7.668 (99')  18.387 (99.999')
  65536 len bytes latencies (us):  12.333 (50')  12.421 (75')  12.607 (99')  91.656 (99.999')
 131072 len bytes latencies (us):  22.162 (50')  22.243 (75')  22.451 (99')  97.057 (99.999')
 262144 len bytes latencies (us):  41.712 (50')  41.792 (75')  42.023 (99') 108.628 (99.999')
 524288 len bytes latencies (us):  80.982 (50')  81.076 (75')  81.365 (99') 150.231 (99.999')
1048576 len bytes latencies (us): 160.315 (50') 160.436 (75') 160.854 (99') 231.403 (99.999')

test_dma_bridge_independent_streams gets 6517.62 Mbytes/sec in the host-to-card direction, which is 83% of the thoretical bandwidth of a gen4x4 link:

linux@DESKTOP-OQMPARM:~/fpga_sio/software_tests/eclipse_project/bin/release> xilinx_dma_bridge_for_pcie/test_dma_bridge
_independent_streams 
Opening device 0000:01:00.0 (10ee:b044) with IOMMU group 12
Enabled bus master for 0000:01:00.0
Using num_descriptors=64 bytes_per_buffer=0x1000000 data_mapping_size_words=0x10000000
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 12 H2C channel 0
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 12 H2C channel 1
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 12 H2C channel 2
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 12 H2C channel 3
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 12 C2H channel 0
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 12 C2H channel 1
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 12 C2H channel 2
Selecting test of VD100_dma_stream_crc64 design PCI device 0000:01:00.0 IOMMU group 12 C2H channel 3
Press Ctrl-C to stop test
  0000:01:00.0 H2C channel 0 1629.406 Mbytes/sec (16290676736 bytes in 971 transfers over 9.997924 secs)
  0000:01:00.0 H2C channel 1 1629.406 Mbytes/sec (16290676736 bytes in 971 transfers over 9.997926 secs)
  0000:01:00.0 H2C channel 2 1629.406 Mbytes/sec (16290676736 bytes in 971 transfers over 9.997927 secs)
  0000:01:00.0 H2C channel 3 1629.405 Mbytes/sec (16290676736 bytes in 971 transfers over 9.997930 secs)
  0000:01:00.0 C2H channel 0 0.001 Mbytes/sec (7768 bytes in 971 transfers over 9.997921 secs)
  0000:01:00.0 C2H channel 1 0.001 Mbytes/sec (7768 bytes in 971 transfers over 9.997922 secs)
  0000:01:00.0 C2H channel 2 0.001 Mbytes/sec (7768 bytes in 971 transfers over 9.997923 secs)
  0000:01:00.0 C2H channel 3 0.001 Mbytes/sec (7768 bytes in 971 transfers over 9.997927 secs)

  0000:01:00.0 H2C channel 0 1629.384 Mbytes/sec (16290676736 bytes in 971 transfers over 9.998057 secs)
  0000:01:00.0 H2C channel 1 1629.384 Mbytes/sec (16290676736 bytes in 971 transfers over 9.998057 secs)
  0000:01:00.0 H2C channel 2 1629.384 Mbytes/sec (16290676736 bytes in 971 transfers over 9.998057 secs)
  0000:01:00.0 H2C channel 3 1629.384 Mbytes/sec (16290676736 bytes in 971 transfers over 9.998057 secs)
  0000:01:00.0 C2H channel 0 0.001 Mbytes/sec (7768 bytes in 971 transfers over 9.998057 secs)
  0000:01:00.0 C2H channel 1 0.001 Mbytes/sec (7768 bytes in 971 transfers over 9.998057 secs)
  0000:01:00.0 C2H channel 2 0.001 Mbytes/sec (7768 bytes in 971 transfers over 9.998057 secs)
  0000:01:00.0 C2H channel 3 0.001 Mbytes/sec (7768 bytes in 971 transfers over 9.998057 secs)

  0000:01:00.0 H2C channel 0 1629.403 Mbytes/sec (16290676736 bytes in 971 transfers over 9.997940 secs)
  0000:01:00.0 H2C channel 1 1629.403 Mbytes/sec (16290676736 bytes in 971 transfers over 9.997940 secs)
  0000:01:00.0 H2C channel 2 1629.403 Mbytes/sec (16290676736 bytes in 971 transfers over 9.997940 secs)
  0000:01:00.0 H2C channel 3 1629.403 Mbytes/sec (16290676736 bytes in 971 transfers over 9.997939 secs)
  0000:01:00.0 C2H channel 0 0.001 Mbytes/sec (7768 bytes in 971 transfers over 9.997939 secs)
  0000:01:00.0 C2H channel 1 0.001 Mbytes/sec (7768 bytes in 971 transfers over 9.997939 secs)
  0000:01:00.0 C2H channel 2 0.001 Mbytes/sec (7768 bytes in 971 transfers over 9.997939 secs)
  0000:01:00.0 C2H channel 3 0.001 Mbytes/sec (7768 bytes in 971 transfers over 9.997939 secs)

^C  0000:01:00.0 H2C channel 0 1629.562 Mbytes/sec (2197815296 bytes in 131 transfers over 1.348715 secs)
  0000:01:00.0 H2C channel 1 1629.563 Mbytes/sec (2197815296 bytes in 131 transfers over 1.348715 secs)
  0000:01:00.0 H2C channel 2 1629.564 Mbytes/sec (2197815296 bytes in 131 transfers over 1.348714 secs)
  0000:01:00.0 H2C channel 3 1629.568 Mbytes/sec (2197815296 bytes in 131 transfers over 1.348711 secs)
  0000:01:00.0 C2H channel 0 0.001 Mbytes/sec (1048 bytes in 131 transfers over 1.348715 secs)
  0000:01:00.0 C2H channel 1 0.001 Mbytes/sec (1048 bytes in 131 transfers over 1.348715 secs)
  0000:01:00.0 C2H channel 2 0.001 Mbytes/sec (1048 bytes in 131 transfers over 1.348714 secs)
  0000:01:00.0 C2H channel 3 0.001 Mbytes/sec (1048 bytes in 131 transfers over 1.348711 secs)

Overall test statistics:
  0000:01:00.0 H2C channel 0 1629.405 Mbytes/sec (51069845504 bytes in 3044 transfers over 31.342636 secs)
  0000:01:00.0 H2C channel 1 1629.405 Mbytes/sec (51069845504 bytes in 3044 transfers over 31.342637 secs)
  0000:01:00.0 H2C channel 2 1629.405 Mbytes/sec (51069845504 bytes in 3044 transfers over 31.342637 secs)
  0000:01:00.0 H2C channel 3 1629.405 Mbytes/sec (51069845504 bytes in 3044 transfers over 31.342637 secs)
  0000:01:00.0 C2H channel 0 0.001 Mbytes/sec (24352 bytes in 3044 transfers over 31.342633 secs)
  0000:01:00.0 C2H channel 1 0.001 Mbytes/sec (24352 bytes in 3044 transfers over 31.342633 secs)
  0000:01:00.0 C2H channel 2 0.001 Mbytes/sec (24352 bytes in 3044 transfers over 31.342633 secs)
  0000:01:00.0 C2H channel 3 0.001 Mbytes/sec (24352 bytes in 3044 transfers over 31.342634 secs)


Overall PASS

5.2. VD100_dma_stream_loopback

linux@DESKTOP-OQMPARM:~/fpga_sio/software_tests/eclipse_project/bin/release> xilinx_dma_bridge_for_pcie/test_dma_bridge_parallel_streams 
Opening device 0000:01:00.0 (10ee:b044) with IOMMU group 12
Enabled bus master for 0000:01:00.0
Using num_descriptors=64 bytes_per_buffer=0x1000000 data_mapping_size_words=0x10000000
Device 0000:01:00.0 design VD100_dma_stream_loopback routes updated
Selecting test of VD100_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 12 H2C channel 1 C2H channel 0
Selecting test of VD100_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 12 H2C channel 0 C2H channel 1
Selecting test of VD100_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 12 H2C channel 3 C2H channel 2
Selecting test of VD100_dma_stream_loopback design PCI device 0000:01:00.0 IOMMU group 12 H2C channel 2 C2H channel 3
Press Ctrl-C to stop test
  0000:01:00.0 1 -> 0 1608.050 Mbytes/sec (16072572928 bytes in 9.995073 secs)
  0000:01:00.0 0 -> 1 1608.048 Mbytes/sec (16072572928 bytes in 9.995081 secs)
  0000:01:00.0 3 -> 2 1608.048 Mbytes/sec (16072572928 bytes in 9.995082 secs)
  0000:01:00.0 2 -> 3 1608.048 Mbytes/sec (16072572928 bytes in 9.995084 secs)

  0000:01:00.0 1 -> 0 1608.114 Mbytes/sec (16072572928 bytes in 9.994674 secs)
  0000:01:00.0 0 -> 1 1608.114 Mbytes/sec (16072572928 bytes in 9.994675 secs)
  0000:01:00.0 3 -> 2 1608.114 Mbytes/sec (16072572928 bytes in 9.994674 secs)
  0000:01:00.0 2 -> 3 1608.114 Mbytes/sec (16072572928 bytes in 9.994674 secs)
<snip>
  0000:01:00.0 1 -> 0 1608.084 Mbytes/sec (16072572928 bytes in 9.994860 secs)
  0000:01:00.0 0 -> 1 1608.084 Mbytes/sec (16072572928 bytes in 9.994860 secs)
  0000:01:00.0 3 -> 2 1608.084 Mbytes/sec (16072572928 bytes in 9.994860 secs)
  0000:01:00.0 2 -> 3 1608.084 Mbytes/sec (16072572928 bytes in 9.994860 secs)

  0000:01:00.0 1 -> 0 1600.080 Mbytes/sec (16005464064 bytes in 10.002915 secs)
  0000:01:00.0 0 -> 1 1600.080 Mbytes/sec (16005464064 bytes in 10.002917 secs)
  0000:01:00.0 3 -> 2 1600.080 Mbytes/sec (16005464064 bytes in 10.002916 secs)
  0000:01:00.0 2 -> 3 1600.080 Mbytes/sec (16005464064 bytes in 10.002917 secs)

^C  0000:01:00.0 1 -> 0 1608.139 Mbytes/sec (1476395008 bytes in 0.918077 secs)
  0000:01:00.0 0 -> 1 1608.202 Mbytes/sec (1476395008 bytes in 0.918041 secs)
  0000:01:00.0 3 -> 2 1608.144 Mbytes/sec (1476395008 bytes in 0.918074 secs)
  0000:01:00.0 2 -> 3 1608.120 Mbytes/sec (1476395008 bytes in 0.918088 secs)

Overall test statistics:
  0000:01:00.0 1 -> 0 1607.165 Mbytes/sec (22373206065152 bytes in 13920.912028 secs)
  0000:01:00.0 0 -> 1 1607.165 Mbytes/sec (22373206065152 bytes in 13920.912040 secs)
  0000:01:00.0 3 -> 2 1607.165 Mbytes/sec (22373206065152 bytes in 13920.912029 secs)
  0000:01:00.0 2 -> 3 1607.165 Mbytes/sec (22373206065152 bytes in 13920.911996 secs)

0000:01:00.0 1 -> 0 Test pattern verified in 268435456 words
0000:01:00.0 0 -> 1 Test pattern verified in 268435456 words
0000:01:00.0 3 -> 2 Test pattern verified in 268435456 words
0000:01:00.0 2 -> 3 Test pattern verified in 268435456 words

Overall PASS

Sustains 6428.66 Mbytes/sec in both directions, which is 81% of the thoretical bandwidth of a gen4x4 link.

5.3. VD100_dma_ddr4

linux@DESKTOP-OQMPARM:~/fpga_sio/software_tests/eclipse_project/bin/release> xilinx_dma_bridge_for_pcie/test_dma_bridgeOpening device 0000:01:00.0 (10ee:b044) with IOMMU group 12
Enabled bus master for 0000:01:00.0
Testing VD100_dma_ddr4 design with memory base address 0x800000000 size 0x100000000
PCI device 0000:01:00.0 IOMMU group 12

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 0 C2H channel 0
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4281.182402 (Mbytes/sec)
  Mean = 4284.970032 (Mbytes/sec)
   Max = 4287.436226 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6237.700393 (Mbytes/sec)
  Mean = 6237.860872 (Mbytes/sec)
   Max = 6238.110521 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4281.249266 (Mbytes/sec)
  Mean = 4282.099292 (Mbytes/sec)
   Max = 4282.816829 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 0 C2H channel 1
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4153.423262 (Mbytes/sec)
  Mean = 4252.772584 (Mbytes/sec)
   Max = 4287.428453 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6236.028389 (Mbytes/sec)
  Mean = 6236.320351 (Mbytes/sec)
   Max = 6236.673586 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 3995.880001 (Mbytes/sec)
  Mean = 4206.971681 (Mbytes/sec)
   Max = 4282.989821 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 0 C2H channel 2
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4286.605248 (Mbytes/sec)
  Mean = 4287.152199 (Mbytes/sec)
   Max = 4287.661695 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6234.701687 (Mbytes/sec)
  Mean = 6235.757450 (Mbytes/sec)
   Max = 6236.412896 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4189.554526 (Mbytes/sec)
  Mean = 4258.425880 (Mbytes/sec)
   Max = 4282.048326 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 0 C2H channel 3
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4286.373387 (Mbytes/sec)
  Mean = 4287.052578 (Mbytes/sec)
   Max = 4287.849997 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6236.219487 (Mbytes/sec)
  Mean = 6236.646372 (Mbytes/sec)
   Max = 6236.967374 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4280.780724 (Mbytes/sec)
  Mean = 4281.798565 (Mbytes/sec)
   Max = 4282.598061 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 1 C2H channel 0
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4225.996955 (Mbytes/sec)
  Mean = 4271.173396 (Mbytes/sec)
   Max = 4286.552558 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6236.234011 (Mbytes/sec)
  Mean = 6236.396904 (Mbytes/sec)
   Max = 6236.571760 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4166.813642 (Mbytes/sec)
  Mean = 4252.831610 (Mbytes/sec)
   Max = 4282.949242 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 1 C2H channel 1
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4286.438449 (Mbytes/sec)
  Mean = 4287.106432 (Mbytes/sec)
   Max = 4288.152487 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6236.075273 (Mbytes/sec)
  Mean = 6236.421481 (Mbytes/sec)
   Max = 6236.842127 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4282.082523 (Mbytes/sec)
  Mean = 4282.918085 (Mbytes/sec)
   Max = 4283.338264 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 1 C2H channel 2
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4286.570718 (Mbytes/sec)
  Mean = 4287.037695 (Mbytes/sec)
   Max = 4287.980247 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6236.417877 (Mbytes/sec)
  Mean = 6236.851872 (Mbytes/sec)
   Max = 6237.113631 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4089.839892 (Mbytes/sec)
  Mean = 4232.800323 (Mbytes/sec)
   Max = 4283.308054 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 1 C2H channel 3
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4286.357585 (Mbytes/sec)
  Mean = 4286.672456 (Mbytes/sec)
   Max = 4287.329530 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6236.958924 (Mbytes/sec)
  Mean = 6237.111955 (Mbytes/sec)
   Max = 6237.276443 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4108.600291 (Mbytes/sec)
  Mean = 4237.805324 (Mbytes/sec)
   Max = 4283.197327 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 2 C2H channel 0
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4286.799537 (Mbytes/sec)
  Mean = 4287.487354 (Mbytes/sec)
   Max = 4288.026054 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6236.221841 (Mbytes/sec)
  Mean = 6236.618416 (Mbytes/sec)
   Max = 6237.220131 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4280.900428 (Mbytes/sec)
  Mean = 4282.069429 (Mbytes/sec)
   Max = 4283.055459 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 2 C2H channel 1
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4225.085630 (Mbytes/sec)
  Mean = 4271.564670 (Mbytes/sec)
   Max = 4287.744436 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6237.397034 (Mbytes/sec)
  Mean = 6237.489421 (Mbytes/sec)
   Max = 6237.644607 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4172.439468 (Mbytes/sec)
  Mean = 4254.461719 (Mbytes/sec)
   Max = 4282.874967 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 2 C2H channel 2
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4286.360781 (Mbytes/sec)
  Mean = 4286.920626 (Mbytes/sec)
   Max = 4287.512692 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6235.577942 (Mbytes/sec)
  Mean = 6236.342509 (Mbytes/sec)
   Max = 6236.787977 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4282.170710 (Mbytes/sec)
  Mean = 4282.777552 (Mbytes/sec)
   Max = 4283.130082 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 2 C2H channel 3
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4059.742207 (Mbytes/sec)
  Mean = 4228.027153 (Mbytes/sec)
   Max = 4287.672011 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6236.836439 (Mbytes/sec)
  Mean = 6237.318291 (Mbytes/sec)
   Max = 6237.724120 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4172.835231 (Mbytes/sec)
  Mean = 4254.792351 (Mbytes/sec)
   Max = 4283.197288 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 3 C2H channel 0
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4162.109909 (Mbytes/sec)
  Mean = 4255.005969 (Mbytes/sec)
   Max = 4287.518166 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6236.878770 (Mbytes/sec)
  Mean = 6237.110135 (Mbytes/sec)
   Max = 6237.351942 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4236.347223 (Mbytes/sec)
  Mean = 4271.152435 (Mbytes/sec)
   Max = 4283.227671 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 3 C2H channel 1
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4285.484718 (Mbytes/sec)
  Mean = 4286.966098 (Mbytes/sec)
   Max = 4287.567769 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6236.340354 (Mbytes/sec)
  Mean = 6236.581911 (Mbytes/sec)
   Max = 6237.054205 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4282.045201 (Mbytes/sec)
  Mean = 4282.501944 (Mbytes/sec)
   Max = 4283.140248 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 3 C2H channel 2
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4244.594775 (Mbytes/sec)
  Mean = 4276.352786 (Mbytes/sec)
   Max = 4287.689958 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6236.812910 (Mbytes/sec)
  Mean = 6236.947440 (Mbytes/sec)
   Max = 6237.082600 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4036.984879 (Mbytes/sec)
  Mean = 4218.050922 (Mbytes/sec)
   Max = 4282.553121 (Mbytes/sec)
TEST PASS

Testing using 32 buffers of size 0x8000000 bytes, H2C channel 3 C2H channel 3
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4286.636676 (Mbytes/sec)
  Mean = 4287.467191 (Mbytes/sec)
   Max = 4287.984990 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6236.376720 (Mbytes/sec)
  Mean = 6236.800031 (Mbytes/sec)
   Max = 6237.084855 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4281.098971 (Mbytes/sec)
  Mean = 4282.100022 (Mbytes/sec)
   Max = 4283.152883 (Mbytes/sec)
TEST PASS
Testing VD100_dma_ddr4 design with memory base address 0x800000000 size 0x100000000
PCI device 0000:01:00.0 IOMMU group 12

Testing using:
  H2C channel 0 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 0 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4031.734314 (Mbytes/sec)
  Mean = 4220.509585 (Mbytes/sec)
   Max = 4288.043242 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6065.737900 (Mbytes/sec)
  Mean = 6065.879295 (Mbytes/sec)
   Max = 6065.945184 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4204.092108 (Mbytes/sec)
  Mean = 4262.316515 (Mbytes/sec)
   Max = 4282.124063 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 0 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 1 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4115.708564 (Mbytes/sec)
  Mean = 4242.795729 (Mbytes/sec)
   Max = 4287.856234 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6065.679426 (Mbytes/sec)
  Mean = 6065.896429 (Mbytes/sec)
   Max = 6066.180157 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4281.765167 (Mbytes/sec)
  Mean = 4282.283391 (Mbytes/sec)
   Max = 4282.958664 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 0 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 2 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4285.477440 (Mbytes/sec)
  Mean = 4286.498264 (Mbytes/sec)
   Max = 4287.587565 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6066.219706 (Mbytes/sec)
  Mean = 6066.375227 (Mbytes/sec)
   Max = 6066.594328 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4281.997605 (Mbytes/sec)
  Mean = 4282.481524 (Mbytes/sec)
   Max = 4283.093605 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 0 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 3 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4286.593051 (Mbytes/sec)
  Mean = 4286.952517 (Mbytes/sec)
   Max = 4287.415554 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6064.584613 (Mbytes/sec)
  Mean = 6065.295221 (Mbytes/sec)
   Max = 6065.960091 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4065.745362 (Mbytes/sec)
  Mean = 4226.114991 (Mbytes/sec)
   Max = 4282.992785 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 1 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 0 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4286.431369 (Mbytes/sec)
  Mean = 4286.900562 (Mbytes/sec)
   Max = 4287.823623 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6063.631887 (Mbytes/sec)
  Mean = 6064.535331 (Mbytes/sec)
   Max = 6065.304308 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4281.001833 (Mbytes/sec)
  Mean = 4282.396134 (Mbytes/sec)
   Max = 4283.232319 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 1 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 1 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4021.814055 (Mbytes/sec)
  Mean = 4217.503345 (Mbytes/sec)
   Max = 4287.855223 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6065.075168 (Mbytes/sec)
  Mean = 6065.348857 (Mbytes/sec)
   Max = 6065.768269 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4218.953674 (Mbytes/sec)
  Mean = 4266.427435 (Mbytes/sec)
   Max = 4283.137519 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 1 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 2 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4130.434161 (Mbytes/sec)
  Mean = 4246.648576 (Mbytes/sec)
   Max = 4287.749192 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6065.508727 (Mbytes/sec)
  Mean = 6065.771362 (Mbytes/sec)
   Max = 6066.023104 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4281.911474 (Mbytes/sec)
  Mean = 4282.635392 (Mbytes/sec)
   Max = 4283.346530 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 1 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 3 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4285.642916 (Mbytes/sec)
  Mean = 4286.669303 (Mbytes/sec)
   Max = 4287.719031 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6065.184464 (Mbytes/sec)
  Mean = 6065.365671 (Mbytes/sec)
   Max = 6065.657761 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4281.984725 (Mbytes/sec)
  Mean = 4282.462219 (Mbytes/sec)
   Max = 4283.010950 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 2 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 0 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4286.117941 (Mbytes/sec)
  Mean = 4286.757748 (Mbytes/sec)
   Max = 4287.740280 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6065.007910 (Mbytes/sec)
  Mean = 6065.224506 (Mbytes/sec)
   Max = 6065.406538 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4008.331970 (Mbytes/sec)
  Mean = 4210.467991 (Mbytes/sec)
   Max = 4283.248226 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 2 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 1 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4286.474384 (Mbytes/sec)
  Mean = 4286.992063 (Mbytes/sec)
   Max = 4287.784301 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6065.302441 (Mbytes/sec)
  Mean = 6065.623025 (Mbytes/sec)
   Max = 6066.242128 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4281.164710 (Mbytes/sec)
  Mean = 4282.480700 (Mbytes/sec)
   Max = 4283.277375 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 2 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 2 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4044.123080 (Mbytes/sec)
  Mean = 4223.528416 (Mbytes/sec)
   Max = 4287.872972 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6065.846398 (Mbytes/sec)
  Mean = 6066.242548 (Mbytes/sec)
   Max = 6066.576864 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4230.785393 (Mbytes/sec)
  Mean = 4269.363152 (Mbytes/sec)
   Max = 4283.217779 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 2 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 3 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4190.262868 (Mbytes/sec)
  Mean = 4262.580389 (Mbytes/sec)
   Max = 4287.848982 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6064.633527 (Mbytes/sec)
  Mean = 6065.287460 (Mbytes/sec)
   Max = 6065.761407 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4281.779330 (Mbytes/sec)
  Mean = 4282.249811 (Mbytes/sec)
   Max = 4283.144203 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 3 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 0 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4286.239244 (Mbytes/sec)
  Mean = 4287.024862 (Mbytes/sec)
   Max = 4287.713963 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6061.777956 (Mbytes/sec)
  Mean = 6064.603932 (Mbytes/sec)
   Max = 6065.691813 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4281.909523 (Mbytes/sec)
  Mean = 4282.271214 (Mbytes/sec)
   Max = 4283.000904 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 3 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 1 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4231.894852 (Mbytes/sec)
  Mean = 4273.359439 (Mbytes/sec)
   Max = 4287.747959 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6065.190939 (Mbytes/sec)
  Mean = 6065.974433 (Mbytes/sec)
   Max = 6066.334381 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4041.227770 (Mbytes/sec)
  Mean = 4219.206635 (Mbytes/sec)
   Max = 4282.191818 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 3 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 2 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4286.690939 (Mbytes/sec)
  Mean = 4286.909073 (Mbytes/sec)
   Max = 4287.247678 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6065.823370 (Mbytes/sec)
  Mean = 6066.258168 (Mbytes/sec)
   Max = 6066.567053 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4281.653780 (Mbytes/sec)
  Mean = 4282.096146 (Mbytes/sec)
   Max = 4282.527782 (Mbytes/sec)
TEST PASS

Testing using:
  H2C channel 3 transfer length 0x10000000 bytes with 32 descriptors
  C2H channel 3 transfer length 0x10000000 bytes with 32 descriptors
populate test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4070.476133 (Mbytes/sec)
  Mean = 4230.718801 (Mbytes/sec)
   Max = 4287.453619 (Mbytes/sec)
host-to-card and card-to-host DMA timing for 4 transfers of 4294967296 bytes:
   Min = 6065.251255 (Mbytes/sec)
  Mean = 6065.589779 (Mbytes/sec)
   Max = 6065.971477 (Mbytes/sec)
verify test pattern timing for 4 transfers of 4294967296 bytes:
   Min = 4281.702304 (Mbytes/sec)
  Mean = 4282.329623 (Mbytes/sec)
   Max = 4283.176627 (Mbytes/sec)
TEST PASS

Overall PASS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment