Help tuning 25G NICs in Ubuntu

Is anyone here a Debian/Ubuntu networking nerd? I am trying to figgier out why I am getting poor performance with my 25G NICs but it works fine/good enough under ESXi, so I assume I am missing some tuning settings on the Linux side, I am just not sure what, I am using 9k MTU. Here is an iperf3 screenshot of ESXi to ubuntu.

and one of ESXi > ESXi

Can you provide ifconfig of your NIC for both cases?

P.S. Often high bandwidth card has low defaults for txqueuelen (around 1k) that is basically minimum transmit queue.
To tweak it you can do ifconfig ethX txqueuelen 10000, but ifconfig will show if those settings differ at all.

3 Likes

Try multiple connections… iperf3 -P 4 .... as a sanity check.

Other than that, for a single TCP connection:
You can try increasing the TCP buffer sizes using sysctl on both ends.
You can try changing the TCP congestion control algorithm on the sending side to something simpler (fifo or pfifo).
You can disable firewall if you have it enabled and don’t need it.
You can “play” with ethtool NIC driver parameters to control various kinds of driver/card features that have to do with interrupts or offloading.

Use the perf tool to see what’s eating your CPU as you make changes.

2 Likes

enp136s0f0np0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 5c:6f:69:9f:b8:91 txqueuelen 1000 (Ethernet)
RX packets 75062821 bytes 374766726103 (374.7 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 66812686 bytes 355101562658 (355.1 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp136s0f1np1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 5c:6f:69:9f:b8:91 txqueuelen 1000 (Ethernet)
RX packets 75908397 bytes 378035151969 (378.0 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 66827377 bytes 355258353387 (355.2 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0


[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 5.93 GBytes 5.10 Gbits/sec 2696 sender
[ 5] 0.00-10.00 sec 5.93 GBytes 5.09 Gbits/sec receiver
[ 7] 0.00-10.00 sec 5.93 GBytes 5.10 Gbits/sec 3157 sender
[ 7] 0.00-10.00 sec 5.93 GBytes 5.09 Gbits/sec receiver
[ 9] 0.00-10.00 sec 5.93 GBytes 5.10 Gbits/sec 502 sender
[ 9] 0.00-10.00 sec 5.93 GBytes 5.09 Gbits/sec receiver
[ 11] 0.00-10.00 sec 5.93 GBytes 5.09 Gbits/sec 2355 sender
[ 11] 0.00-10.00 sec 5.93 GBytes 5.09 Gbits/sec receiver
[SUM] 0.00-10.00 sec 23.7 GBytes 20.4 Gbits/sec 8710 sender
[SUM] 0.00-10.00 sec 23.7 GBytes 20.4 Gbits/sec receiver

What kind of nic are you using?
In my case I use a mellanox 4lx and I’m using the mellanox ofed drivers not the driver included in the standard kernel (debian with kernel 5.16.12). I’m also running ‘tuned’ with the profile ‘network-latency’

[email protected]:~$ iperf3 --client 192.168.1.10
Connecting to host 192.168.1.10, port 5201
[ 5] local 192.168.1.1 port 33762 connected to 192.168.1.10 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.87 GBytes 24.7 Gbits/sec 0 2.71 MBytes
[ 5] 1.00-2.00 sec 2.88 GBytes 24.8 Gbits/sec 0 2.83 MBytes
[ 5] 2.00-3.00 sec 2.88 GBytes 24.7 Gbits/sec 0 2.83 MBytes
[ 5] 3.00-4.00 sec 2.88 GBytes 24.7 Gbits/sec 0 2.99 MBytes
[ 5] 4.00-5.00 sec 2.88 GBytes 24.7 Gbits/sec 0 2.99 MBytes
[ 5] 5.00-6.00 sec 2.88 GBytes 24.7 Gbits/sec 0 3.14 MBytes
[ 5] 6.00-7.00 sec 2.88 GBytes 24.7 Gbits/sec 0 3.14 MBytes
[ 5] 7.00-8.00 sec 2.86 GBytes 24.6 Gbits/sec 0 3.14 MBytes
[ 5] 8.00-9.00 sec 2.84 GBytes 24.4 Gbits/sec 0 3.14 MBytes
[ 5] 9.00-10.00 sec 2.88 GBytes 24.7 Gbits/sec 0 3.14 MBytes


[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 28.7 GBytes 24.7 Gbits/sec 0 sender
[ 5] 0.00-10.00 sec 28.7 GBytes 24.7 Gbits/sec receiver

iperf Done.
[email protected]:~$

I’ve set the MTU also to 9000, but I didn’t do any changes to queue length.

3: enp65s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000

4: enp1s0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000

[email protected]:~# mlxlink -d 41:00.1

Operational Info

State : Active
Physical state : LinkUp
Speed : 25GbE
Width : 1x
FEC : Standard RS-FEC - RS(528,514)
Loopback Mode : No Loopback
Auto Negotiation : ON

Supported Info

Enabled Link Speed : 0x38007013 (25G,10G,1G)
Supported Cable Speed : 0x20002001 (25G,10G,1G)

Troubleshooting Info

Status Opcode : 0
Group Opcode : N/A
Recommendation : No issue was observed.

Tool Information

Firmware Version : 14.32.1010
MFT Version : mft 4.21.0-99

[email protected]:~# mlxlink -d 01:00.0

Operational Info

State : Active
Physical state : LinkUp
Speed : 25GbE
Width : 1x
FEC : Standard RS-FEC - RS(528,514)
Loopback Mode : No Loopback
Auto Negotiation : ON

Supported Info

Enabled Link Speed : 0x38007013 (25G,10G,1G)
Supported Cable Speed : 0x20002001 (25G,10G,1G)

Troubleshooting Info

Status Opcode : 0
Group Opcode : N/A
Recommendation : No issue was observed.

Tool Information

Firmware Version : 14.32.1010
MFT Version : mft 4.21.0-99

[email protected]:~# tc qdisc show | grep enp65s0f1
qdisc mq 0: dev enp65s0f1 root
qdisc fq_codel 0: dev enp65s0f1 parent :3f limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :3e limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :3d limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :3c limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :3b limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :3a limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :39 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :38 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :37 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :36 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :35 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :34 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :33 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :32 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :31 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :30 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :2f limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :2e limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :2d limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :2c limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :2b limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :2a limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :29 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :28 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :27 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :26 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :25 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :24 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :23 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :22 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :21 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :20 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :1f limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :1e limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :1d limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :1c limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :1b limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :1a limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :19 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :18 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :17 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :16 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :15 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :14 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :13 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :12 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :11 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :10 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :f limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :e limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :d limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :c limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :b limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :a limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :9 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :8 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :7 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :6 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :5 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
qdisc fq_codel 0: dev enp65s0f1 parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64
[email protected]:~#

This is my NIC Broadcom Inc. | Connecting Everything

Ok. I found this on the broadcom site: Operating System Tuning (Linux)

Ps: For my Mellanox card there was some script in the installer to tune settings.

Maybe broadcom also offers some script for your card. Installing the Linux Driver on Ethernet Network Adapters

Well, all the monkeying with stuff helped a bit, got an extra 5-7Gbps. Thanks for finding that, looks like there is still some tweaking to figure out though.

ah, it replaced the Ubuntu inbox driver