1 server with ConnectX-2 dual SFP+ NIC, running debian
1 whitebox router with an identical NIC, running vyos
1 PC with ConnectX-3 single port SFP+ NIC
All devices are attached to a Mikrotik CRS317-1G-16S+, no VLANs or anything.
The two devices with dual port NICs are set up with 802.3ad, hash policy layer 3+4.
When I run iperf from the PC to the server, I get close to 10Gb/s no problem, regardless of which host is hosting the iperf server.
The same is true if I run the client on the router and host the server on either of the other two hosts.
However, if I host the server on the router and then try to run the test from any of the other hosts, I only get ~5Gb/s. If I increase the process count in the iperf client, I can “force” the traffic to spread over both interfaces and get more speed, up to around ~15Gb/s with 32 processes.
However, even more interestingly, if I use a fairly low number of processes (2-4), sometimes it happens the traffic won’t go over both interfaces, but I can still get more than 5Gb/s. Up to 10Gb/s, of course.
So it seems that the NIC itself is perfectly capable of transmitting the traffic, but just doesn’t do it for a single process? Logs don’t show anything, the firmware is the latest available version and I even tried switching the NIC in the server and the router, but I still get the same result. RX / TX flow control is already disabled on the Mikrotik side.
Did some more testing, and I noticed that for some reason the CPU on the router is at 100% percent if it’s hosting the server (at least on one core, if only running a single process, or all of them if more).
So this seems to be a CPU limit, rather than a limit of the NIC.
This is kind of surprising to me, as the router is running an E3-1220 v3, so a quadcore at 3.1GHz. It’s not the newest processor but I think it should be handle this easily or am I mistaken?
Is there some hardware offloading that I should enable for the NIC / driver? The other hosts seem to be able to handle the traffic without any sort of large increase in CPU load.
Are you running the same/newest version of iperf3?
Have you looked at tcp congestion control algorithms (specifically, disabling them)
Have you looked at read/write socket buffer sizes? What about frames/segment sizes aka. jumbo frames? (It’s cheating, it should do 1GB/s despite it, but if it helps why not use it permanently)
Is there some hardware offloading that I should enable for the NIC / driver? … modern nics basically do tcp in hardware, but kernel should take advantage of these automatically.
Thanks for the reply, risk.
Just 5 minutes ago I managed to get it working and was already writing my reply
Basically, the solution (at least what worked for me) was what you said.
I enabled jumbo frames on all interfaces, but forgot to set it on the switch, so the MTU of 1500 was used. I changed this and now I get 10Gb/s without any issues and the router is only at ~30-50% CPU on a single core.
This is also fairly well documented, especially for the ConnectX-2 card, which I didn’t realize before I noticed that the issue was related to the CPU load. Link for reference: Mellanox ConnectX-2 EN and Windows 10? | ServeTheHome Forums (the title says windows, but there’s also talk about linux in the thread)