Server build for dual 100GbE cards

I am planning on building a 1-or-2U server with two dual 100GbE cards. One for traffic generation, another for WAN emulation. The Wan emulation is essentially a ram buffer. I have used a ConnectX-5 with an Epyc 7232P to generate about 61Gbps UDP traffic, one way send to self. This was using various tuning guides for low latency network configuration.

The NUMA planning on this build is obviously necessary, but I’m of the opinion need better processor(s) to handle multiple 100GbE cards.

Please let me hear your suggestions!

Some follow-up here: I built a 7950X with an Intel e810c 100GbE card and was easily able to generate 99Gbps traffic in one direction, send-to-self. That maximized a core at 5.6Ghz.

Expanding multiple connections (NFS4, SMB3) do not scale beyond about 8 workers. I was able to generate 50-60Gbps NFS4 writes to tmpfs backed directory with 8 workers using MTU 8192 1MB rw size each sending at 10Gbps.

SMB3 was much more protocol limited, 6 workers sending either at MTU 1500 or MTU 8192 with 1MB writes worked up to 20Gbps, and tuning Samba Socket settings to use 2MB RX buffers, I got to 30Gbps.

This tells me that I want to use frequency optimized processors like a Xeon Gold single socket system or an Epyc 9254.

1 Like

From what I can tell, there is a lot of work going into optimizing Linux for SMB over high-rate Ethernet. (likely the same for Windows.) Also in the early days of 10GbE, Sun found that the interrupt rate was too high for any single CPU-core, and so came up with a scheme to distribute interrupts across all available cores. Present-day Intel Ethernet cards follow the same scheme, and pretend to be as many Ethernet cards as there are CPU-cores.

You did not mention which Linux kernel version you use. Better to use something very recent. More recent Linux will make more use of more recent Ethernet enhancements, that reduce the interrupt rate.

Older Ethernet cards are likely to generate more interrupts, but the Intel 810 you use seems relatively recent.

Also, you might do better with more CPU cores. For a recent project I specified one core per 10GbE connection. With 6x links at full rate, the CPUs are not even breathing hard. Might be your 8x cores at 100GbE are not optimal.

(At this point you are beyond my level of knowledge. Landed here as was thinking about running 4x to 8x 100GbE links into an EPYC, and had questions.)

One process per core is very efficient , and yet, not all test scenarios want just thruput. This testing was on 6.7.0-rc5, and I’ll look again on 6.7.3 later.

I have fount that it is possible to do 70Gbps of TCP throughput across 8 dual port 10GbE cards.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.