What did you use to test your throughput? Was it a single stream test or parallel?
Getting line rate on a NIC can be difficult, especially when you are trying to do it with a single stream.
Are you using jumbo frames?
Did your benchmark tool support zerocopy?
Does your benchmark tool support setting the receive/send socket buffers/windows?
How busy were the cpus on either endpoint? is your switch busy?
What kind of tuning on the switch? any pfc, cos shaping / traffic diffrentiation, link aggregation?
Are you traversing a firewall or do you have firewalling enabled on either endpoint?
Did you pin your benchmark tools to the numa domain that has direct access to your NIC ?
What performance profile are you running on the client and the server?
Performance governors? cpu states
What msi mode are you using?
Have you applied network stack tuning for 40gbit like the suggestions on fasterdata.es.net?
Many questions with many knobs and levers to turn and pull.
These are some great pointers, I will be very happy to look into all of these. Thank you for your reply!
Just some general info for anyone interested:
The PC is running a Threadripper 2920x, the NAS is a Dell server R720xd running two E5-2650 v2s.
No jumbo frames at the moment.
Testing was done using iperf. Parallel streams did not make much of a difference. I did not try with multiple processes.
CPUs where basically idle.
No tuning on the switch, unfortunately it does not support PFC. The switch has only the two 40Gb hosts on it ATM.
No firewalls on or between the hosts.
I did not look into NUMA at all, this might be critical for my problem as the NAS is running two CPUs (and possibly and issue with the Threadripper that is running in NUMA mode).
Both systems running schedutil CPU governors.
No network tuning as of yet on the hosts.
I unexpectedly had to take the NAS down today in order to move it because we have to do some repairs at the house, so I won’t be able to run any tests for a brief period of time. I will have a look at everything mentioned and report back once I have everything running again.
Before you begin
You are running RHEL 7 and the latest compatible SUSE Linux Enterprise Server 12 and 15 service pack operating system. See the NetApp Interoperability Matrix Tool for a complete list of the latest requirements.
Procedure
Install the rdma and nvme-cli packages:
zypper install rdma-core
zypper install nvme-cli
RHEL 7
yum install rdma-core
yum install nvme-cli
Setup IPv4 IP addresses on the ethernet ports used to connect NVMe over RoCE. For each network interface, create a configuration script that contains the different variables for that interface.
The variables used in this step are based on server hardware and the network environment. The variables include the IPADDR and GATEWAY. These are example instructions for the latest SUSE Linux Enterprise Server 12 service pack:
Create the example file/etc/sysconfig/network/ifcfg-eth4 as follows:
BOOTPROTO=‘static’
BROADCAST=
ETHTOOL_OPTIONS=
IPADDR=‘192.168.1.87/24’
GATEWAY=‘192.168.1.1’
MTU=
NAME=‘MT27800 Family [ConnectX-5]’
NETWORK=
REMOTE_IPADDR=
STARTMODE=‘auto’
Create the second example file/etc/sysconfig/network/ifcfg-eth5 as follows:
BOOTPROTO=‘static’
BROADCAST=
ETHTOOL_OPTIONS=
IPADDR=‘192.168.2.87/24’
GATEWAY=‘192.168.2.1’
MTU=
NAME=‘MT27800 Family [ConnectX-5]’
NETWORK=
REMOTE_IPADDR=
STARTMODE=‘auto’
Enable the network interfaces:
ifup eth4
ifup eth5
Set up the NVMe-oF layer on the host.
Create the following file under /etc/modules-load.d/ to load the nvme-rdma kernel module and make sure the kernel module will always be on, even after a reboot: