Notes about results of ibv_generate_infiniband_test_load.
Dual 10 GbE ports connected to a switch. AlmaLinux release 8.10 with Kernel 4.18.0-553.104.1.el8_10.x86_64.
Using the initial code which was fixed to use GID index 0, which is RoCEv1.
Initially the active MTU set to 1024:
$ ibv_devinfo
hca_id: rocep33s0f0
transport: InfiniBand (0)
fw_ver: 14.32.1010
node_guid: 9803:9b03:0077:e152
sys_image_guid: 9803:9b03:0077:e152
vendor_id: 0x02c9
vendor_part_id: 4117
hw_ver: 0x0
board_id: MT_2420110004
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
hca_id: rocep33s0f1
transport: InfiniBand (0)
fw_ver: 14.32.1010
node_guid: 9803:9b03:0077:e153
sys_image_guid: 9803:9b03:0077:e152
vendor_id: 0x02c9
vendor_part_id: 4117
hw_ver: 0x0
board_id: MT_2420110004
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
Results:
[mr_halfword@skylake-alma release]$ ibv_generate_infiniband_test_load/ibv_generate_infiniband_test_load 0
PRBS32 pattern period is 4294967295
Press Ctrl-C to stop the RDMA test load
^C
rocep33s0f1 port 1 rx_buffer compare : PASS
rocep33s0f0 port 1 -> rocep33s0f1 port 1 RDMA write transmitted 525059751936 bytes in 462.273715 seconds, 1135.8 Mbytes/sec
rocep33s0f0 port 1 transmitted 563887357956 bytes in 462.273715 seconds, 1219.8 Mbytes/sec
rocep33s0f0 port 1 received 564025459276 bytes in 462.273715 seconds, 1220.1 Mbytes/sec
rocep33s0f0 port 1 rx_buffer compare : PASS
rocep33s0f1 port 1 -> rocep33s0f0 port 1 RDMA write transmitted 525059751936 bytes in 462.241296 seconds, 1135.9 Mbytes/sec
rocep33s0f1 port 1 transmitted 564025556140 bytes in 462.241296 seconds, 1220.2 Mbytes/sec
rocep33s0f1 port 1 received 563887357956 bytes in 462.241296 seconds, 1219.9 Mbytes/sec
Set the Ethernet device MTU to 9600 bytes:
[mr_halfword@skylake-alma release]$ sudo ip link set ens1f0 mtu 9600
[sudo] password for mr_halfword:
[mr_halfword@skylake-alma release]$ sudo ip link set ens1f1 mtu 9600
RoCE active MTU then increases to 4096:
$ ibv_devinfo
hca_id: rocep33s0f0
transport: InfiniBand (0)
fw_ver: 14.32.1010
node_guid: 9803:9b03:0077:e152
sys_image_guid: 9803:9b03:0077:e152
vendor_id: 0x02c9
vendor_part_id: 4117
hw_ver: 0x0
board_id: MT_2420110004
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
hca_id: rocep33s0f1
transport: InfiniBand (0)
fw_ver: 14.32.1010
node_guid: 9803:9b03:0077:e153
sys_image_guid: 9803:9b03:0077:e152
vendor_id: 0x02c9
vendor_part_id: 4117
hw_ver: 0x0
board_id: MT_2420110004
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
Results:
[mr_halfword@skylake-alma release]$ ibv_generate_infiniband_test_load/ibv_generate_infiniband_test_load 0
PRBS32 pattern period is 4294967295
Press Ctrl-C to stop the RDMA test load
^C
rocep33s0f1 port 1 rx_buffer compare : PASS
rocep33s0f0 port 1 -> rocep33s0f1 port 1 RDMA write transmitted 6019128229888 bytes in 4983.480160 seconds, 1207.8 Mbytes/sec
rocep33s0f0 port 1 transmitted 6176364832464 bytes in 4983.480160 seconds, 1239.4 Mbytes/sec
rocep33s0f0 port 1 received 6176369936084 bytes in 4983.480160 seconds, 1239.4 Mbytes/sec
rocep33s0f0 port 1 rx_buffer compare : PASS
rocep33s0f1 port 1 -> rocep33s0f0 port 1 RDMA write transmitted 6019128229888 bytes in 4983.495318 seconds, 1207.8 Mbytes/sec
rocep33s0f1 port 1 transmitted 6176369977964 bytes in 4983.495318 seconds, 1239.4 Mbytes/sec
rocep33s0f1 port 1 received 6176364832464 bytes in 4983.495318 seconds, 1239.4 Mbytes/sec
Start with the Ethernet MTU 1500 and the RDMA MTU 1024
InitiaL attempt to run using a RoVE v2 GID failed:
[mr_halfword@skylake-alma release]$ ibv_generate_infiniband_test_load/ibv_generate_infiniband_test_load 0 1
Assertion failed : ibv_modify_qp() IBV_QPS_RTR for rocep33s0f1 port 1 GID index 1 type RoCE V2 failed with Connection timed out
ip addr showed the IPv6 link-local scope addressed differed from that assigned for the GIDs.
Manually added IPv6 addresses to the interfaces:
[mr_halfword@skylake-alma ~]$ sudo ip -6 addr add fe80:0000:0000:0000:9a03:9bff:fe77:e152/64 scope link dev ens1f0
[sudo] password for mr_halfword:
[mr_halfword@skylake-alma ~]$ sudo ip -6 addr add fe80:0000:0000:0000:9a03:9bff:fe77:e153/64 scope link dev ens1f1
The test could then be run:
[mr_halfword@skylake-alma release]$ ibv_generate_infiniband_test_load/ibv_generate_infiniband_test_load 0 1
PRBS32 pattern period is 4294967295
Press Ctrl-C to stop the RDMA test load
^C
rocep33s0f1 port 1 rx_buffer compare : PASS
rocep33s0f0 port 1 -> rocep33s0f1 port 1 RDMA write transmitted 69524783104 bytes in 61.683937 seconds, 1127.1 Mbytes/sec
rocep33s0f0 port 1 type RoCE V2 transmitted 75254331456 bytes in 61.683937 seconds, 1220.0 Mbytes/sec
rocep33s0f0 port 1 type RoCE V2 received 75258079592 bytes in 61.683937 seconds, 1220.1 Mbytes/sec
rocep33s0f0 port 1 rx_buffer compare : PASS
rocep33s0f1 port 1 -> rocep33s0f0 port 1 RDMA write transmitted 69524783104 bytes in 61.682691 seconds, 1127.1 Mbytes/sec
rocep33s0f1 port 1 type RoCE V2 transmitted 75258079592 bytes in 61.682691 seconds, 1220.1 Mbytes/sec
rocep33s0f1 port 1 type RoCE V2 received 75254331456 bytes in 61.682691 seconds, 1220.0 Mbytes/sec
Manually increased the Ethernet MTU to 9600:
[mr_halfword@skylake-alma ~]$ sudo ip link set ens1f0 mtu 9600
[mr_halfword@skylake-alma ~]$ sudo ip link set ens1f1 mtu 9600
Which increased the RDMA MTU to 4096.
Results:
[mr_halfword@skylake-alma release]$ ibv_generate_infiniband_test_load/ibv_generate_infiniband_test_load 0 1
PRBS32 pattern period is 4294967295
Press Ctrl-C to stop the RDMA test load
^C
rocep33s0f1 port 1 rx_buffer compare : PASS
rocep33s0f0 port 1 -> rocep33s0f1 port 1 RDMA write transmitted 300916146176 bytes in 249.843426 seconds, 1204.4 Mbytes/sec
rocep33s0f0 port 1 type RoCE V2 transmitted 309655784520 bytes in 249.843426 seconds, 1239.4 Mbytes/sec
rocep33s0f0 port 1 type RoCE V2 received 309392154676 bytes in 249.843426 seconds, 1238.3 Mbytes/sec
rocep33s0f0 port 1 rx_buffer compare : PASS
rocep33s0f1 port 1 -> rocep33s0f0 port 1 RDMA write transmitted 300647710720 bytes in 249.646860 seconds, 1204.3 Mbytes/sec
rocep33s0f1 port 1 type RoCE V2 transmitted 309392221868 bytes in 249.646860 seconds, 1239.3 Mbytes/sec
rocep33s0f1 port 1 type RoCE V2 received 309655784520 bytes in 249.646860 seconds, 1240.4 Mbytes/sec
Rather than manually adding IPv6 addresses to allow RoCEv2 to be used, 7.5 Changing NetworkManger configuration to generat IPv6 link-scope address based upon the MAC address could be more maintainable.
Dual 40Gb/s Infiniband ports, with a PCIe gen2 x8 interface (4 GB/s peak bandwidth). Ubuntu 24.04.3 LTS with Kernel 6.8.0-94-generic
Results:
$ ibv_generate_infiniband_test_load/ibv_generate_infiniband_test_load 0
PRBS32 pattern period is 4294967295
Press Ctrl-C to stop the RDMA test load
^C
ibp3s0 port 2 rx_buffer compare : PASS
ibp3s0 port 1 -> ibp3s0 port 2 RDMA write transmitted 27435177345024 bytes in 16726.068460 seconds, 1640.3 Mbytes/sec
ibp3s0 port 1 transmitted 27596666524268 bytes in 16726.068460 seconds, 1649.9 Mbytes/sec
ibp3s0 port 1 received 27596666528580 bytes in 16726.068460 seconds, 1649.9 Mbytes/sec
ibp3s0 port 1 rx_buffer compare : PASS
ibp3s0 port 2 -> ibp3s0 port 1 RDMA write transmitted 27435177345024 bytes in 16726.068468 seconds, 1640.3 Mbytes/sec
ibp3s0 port 2 transmitted 27596666047044 bytes in 16726.068468 seconds, 1649.9 Mbytes/sec
ibp3s0 port 2 received 27596666042732 bytes in 16726.068468 seconds, 1649.9 Mbytes/sec
Using a ConnectX-5 SFP28 ConnectX-5 SFP28 with dual 10/25 GbE ports. HP Z6 G4 running openSUSE Leap 15.5 with Kernel 5.14.21-150500.55.62-default.
Maually added the RoCE v2 GID index 1 reported by ibv_devinfo -v to the network interfaces, to allow them to be used:
> sudo ip -6 addr add fe80::ba59:9fff:fef9:2d0a/64 scope link dev enp25s0f0np0
> sudo ip -6 addr add fe80::ba59:9fff:fef9:2d0b/64 scope link dev enp25s0f1np1
A 25 GbE DAC cable connected between the ports.
The active link speed is 25000Mb/s, and the active FEC encoding is RS.
This is with the default network interface MTU of 1500, so the Infiniband active MTU is 1024.
Using RoCEv1:
> ibv_generate_infiniband_test_load/ibv_generate_infiniband_test_load mlx5_0:1,mlx5_1:1
PRBS32 pattern period is 4294967295
Press Ctrl-C to stop the RDMA test load
^C
mlx5_1 port 1 rx_buffer compare : PASS
mlx5_0 port 1 -> mlx5_1 port 1 RDMA write transmitted 1080452710400 bytes in 378.158604 seconds, 2857.1 Mbytes/sec
mlx5_0 port 1 type RoCE V1 transmitted 1155159865192 bytes in 378.158604 seconds, 3054.7 Mbytes/sec
mlx5_0 port 1 type RoCE V1 received 1155165018924 bytes in 378.158604 seconds, 3054.7 Mbytes/sec
mlx5_0 port 1 rx_buffer compare : PASS
mlx5_1 port 1 -> mlx5_0 port 1 RDMA write transmitted 1080452710400 bytes in 378.114991 seconds, 2857.5 Mbytes/sec
mlx5_1 port 1 type RoCE V1 transmitted 1155165018924 bytes in 378.114991 seconds, 3055.1 Mbytes/sec
mlx5_1 port 1 type RoCE V1 received 1155159865192 bytes in 378.114991 seconds, 3055.0 Mbytes/sec
Using RoCEv2:
> ibv_generate_infiniband_test_load/ibv_generate_infiniband_test_load mlx5_0:1,mlx5_1:1 --ether-gid-index=1
PRBS32 pattern period is 4294967295
Press Ctrl-C to stop the RDMA test load
^C
mlx5_1 port 1 rx_buffer compare : PASS
mlx5_0 port 1 -> mlx5_1 port 1 RDMA write transmitted 2519266754560 bytes in 887.902518 seconds, 2837.3 Mbytes/sec
mlx5_0 port 1 type RoCE V2 transmitted 2713376456844 bytes in 887.902518 seconds, 3055.9 Mbytes/sec
mlx5_0 port 1 type RoCE V2 received 2713384305884 bytes in 887.902518 seconds, 3055.9 Mbytes/sec
mlx5_0 port 1 rx_buffer compare : PASS
mlx5_1 port 1 -> mlx5_0 port 1 RDMA write transmitted 2519266754560 bytes in 887.906039 seconds, 2837.3 Mbytes/sec
mlx5_1 port 1 type RoCE V2 transmitted 2713384305884 bytes in 887.906039 seconds, 3055.9 Mbytes/sec
mlx5_1 port 1 type RoCE V2 received 2713376456844 bytes in 887.906039 seconds, 3055.9 Mbytes/sec
Increased the MTU on the network interfaces from 1500 to 9600:
> sudo ip link set enp25s0f0np0 mtu 9600
> sudo ip link set enp25s0f1np1 mtu 9600
The Infiniband active MTU increased to 4096.
Using RoCEv1:
> ibv_generate_infiniband_test_load/ibv_generate_infiniband_test_load mlx5_0:1,mlx5_1:1
PRBS32 pattern period is 4294967295
Press Ctrl-C to stop the RDMA test load
^C
mlx5_1 port 1 rx_buffer compare : PASS
mlx5_0 port 1 -> mlx5_1 port 1 RDMA write transmitted 693368782848 bytes in 227.343949 seconds, 3049.9 Mbytes/sec
mlx5_0 port 1 type RoCE V1 transmitted 705632498180 bytes in 227.343949 seconds, 3103.8 Mbytes/sec
mlx5_0 port 1 type RoCE V1 received 705362946232 bytes in 227.343949 seconds, 3102.6 Mbytes/sec
mlx5_0 port 1 rx_buffer compare : PASS
mlx5_1 port 1 -> mlx5_0 port 1 RDMA write transmitted 693100347392 bytes in 227.282797 seconds, 3049.5 Mbytes/sec
mlx5_1 port 1 type RoCE V1 transmitted 705362946232 bytes in 227.282797 seconds, 3103.5 Mbytes/sec
mlx5_1 port 1 type RoCE V1 received 705632498180 bytes in 227.282797 seconds, 3104.6 Mbytes/sec
Using RoCEv2:
ibv_generate_infiniband_test_load/ibv_generate_infiniband_test_load mlx5_0:1,mlx5_1:1 --ether-gid-index=1
PRBS32 pattern period is 4294967295
Press Ctrl-C to stop the RDMA test load
^C
mlx5_1 port 1 rx_buffer compare : PASS
mlx5_0 port 1 -> mlx5_1 port 1 RDMA write transmitted 538213089280 bytes in 176.832048 seconds, 3043.6 Mbytes/sec
mlx5_0 port 1 type RoCE V2 transmitted 548819027232 bytes in 176.832048 seconds, 3103.6 Mbytes/sec
mlx5_0 port 1 type RoCE V2 received 548819092996 bytes in 176.832048 seconds, 3103.6 Mbytes/sec
mlx5_0 port 1 rx_buffer compare : PASS
mlx5_1 port 1 -> mlx5_0 port 1 RDMA write transmitted 538213089280 bytes in 176.831800 seconds, 3043.6 Mbytes/sec
mlx5_1 port 1 type RoCE V2 transmitted 548819092996 bytes in 176.831800 seconds, 3103.6 Mbytes/sec
mlx5_1 port 1 type RoCE V2 received 548819027232 bytes in 176.831800 seconds, 3103.6 Mbytes/sec
Changed the DAC cable to use a pair of 25GBase-LR modules and a 3M single-mode fibre. Two different modules:
- Port 0: Huawei RTXM330-203
- Port 1: ERICSSON RTXM330-205-C24
The active FEC encoding is BaseR, compared to the FS when used the DAC cable. TBD if the change in FEC encoding is repeatible when change the type of modules
Ran RoCEv1 with an active MTU of 4096:
> ibv_generate_infiniband_test_load/ibv_generate_infiniband_test_load mlx5_0:1,mlx5_1:1
PRBS32 pattern period is 4294967295
Press Ctrl-C to stop the RDMA test load
^C
mlx5_1 port 1 rx_buffer compare : PASS
mlx5_0 port 1 -> mlx5_1 port 1 RDMA write transmitted 9508520722432 bytes in 3117.151182 seconds, 3050.4 Mbytes/sec
mlx5_0 port 1 type RoCE V1 transmitted 9676722327900 bytes in 3117.151182 seconds, 3104.3 Mbytes/sec
mlx5_0 port 1 type RoCE V1 received 9675685540572 bytes in 3117.151182 seconds, 3104.0 Mbytes/sec
mlx5_0 port 1 rx_buffer compare : PASS
mlx5_1 port 1 -> mlx5_0 port 1 RDMA write transmitted 9507446980608 bytes in 3117.105530 seconds, 3050.1 Mbytes/sec
mlx5_1 port 1 type RoCE V1 transmitted 9675685540572 bytes in 3117.105530 seconds, 3104.1 Mbytes/sec
mlx5_1 port 1 type RoCE V1 received 9676722327900 bytes in 3117.105530 seconds, 3104.4 Mbytes/sec