[Solved] Asymmetric link speed on 10G NIC (for single process)

I have the following setup:

  • 1 server with ConnectX-2 dual SFP+ NIC, running debian
  • 1 whitebox router with an identical NIC, running vyos
  • 1 PC with ConnectX-3 single port SFP+ NIC

All devices are attached to a Mikrotik CRS317-1G-16S+, no VLANs or anything.
The two devices with dual port NICs are set up with 802.3ad, hash policy layer 3+4.

When I run iperf from the PC to the server, I get close to 10Gb/s no problem, regardless of which host is hosting the iperf server.
The same is true if I run the client on the router and host the server on either of the other two hosts.

However, if I host the server on the router and then try to run the test from any of the other hosts, I only get ~5Gb/s. If I increase the process count in the iperf client, I can “force” the traffic to spread over both interfaces and get more speed, up to around ~15Gb/s with 32 processes.
However, even more interestingly, if I use a fairly low number of processes (2-4), sometimes it happens the traffic won’t go over both interfaces, but I can still get more than 5Gb/s. Up to 10Gb/s, of course.

So it seems that the NIC itself is perfectly capable of transmitting the traffic, but just doesn’t do it for a single process? Logs don’t show anything, the firmware is the latest available version and I even tried switching the NIC in the server and the router, but I still get the same result. RX / TX flow control is already disabled on the Mikrotik side.

Any ideas?

Did some more testing, and I noticed that for some reason the CPU on the router is at 100% percent if it’s hosting the server (at least on one core, if only running a single process, or all of them if more).
So this seems to be a CPU limit, rather than a limit of the NIC.

This is kind of surprising to me, as the router is running an E3-1220 v3, so a quadcore at 3.1GHz. It’s not the newest processor but I think it should be handle this easily or am I mistaken?
Is there some hardware offloading that I should enable for the NIC / driver? The other hosts seem to be able to handle the traffic without any sort of large increase in CPU load.

  • Are you running the same/newest version of iperf3?
  • Have you looked at tcp congestion control algorithms (specifically, disabling them)
  • Have you looked at read/write socket buffer sizes? What about frames/segment sizes aka. jumbo frames? (It’s cheating, it should do 1GB/s despite it, but if it helps why not use it permanently)
  • Is there some hardware offloading that I should enable for the NIC / driver? … modern nics basically do tcp in hardware, but kernel should take advantage of these automatically.
1 Like

Thanks for the reply, risk.
Just 5 minutes ago I managed to get it working and was already writing my reply :smiley:

Basically, the solution (at least what worked for me) was what you said.
I enabled jumbo frames on all interfaces, but forgot to set it on the switch, so the MTU of 1500 was used. I changed this and now I get 10Gb/s without any issues and the router is only at ~30-50% CPU on a single core.
This is also fairly well documented, especially for the ConnectX-2 card, which I didn’t realize before I noticed that the issue was related to the CPU load. Link for reference: Mellanox ConnectX-2 EN and Windows 10? | ServeTheHome Forums (the title says windows, but there’s also talk about linux in the thread)

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.