NFS over RDMA

dhodgson · April 27, 2023, 6:19am

Just picked up a couple ConnectX-4 cards (with VI, so IB/ETH capable) and some intel 100gb SFP modules.

Trying to get NFS over RDMA working, and on the surface it appears to be. With the adapters in ethernet mode, I can mount the NFS share with the RDMA option, and I get about 600MB/s of throughput to the remote mount point (I’m not seeing heavy CPU usage either in htop, so I’m assuming that RDMA really is working in some capacity). Though it’s worth noting that mounting without explicitly setting the rdma option results in identical performance.

But 600MB/s is far shy of the 80-90gbps (10-11GB/s) I’m seeing from the ib_write_bw test, and the latency is about 5 times what ib_write_lat is showing as a practical minimum. Testing with a tmpfs mount doesn’t improve this at all.

The NFS server is a Dell T420 with dual 2470v2’s and 96GB of 1333MHz DDR3. The client is a supermicro board with dual 2690v2’s with 160GB of 1600MHz DDR3. Both adapters are in PCIe 3.0 x16 slots and lshca reflects this.

I’m running the latest firmware for these cards (Dell FW from 2021), and I’ve got the latest OFED driver from Nvidia (I’m on Rocky 8.7, tried the inbox drivers from the repo but one card wouldn’t report all of it’s info, and nfs wouldn’t mount with the rdma option either).

So basically, I’m trying to figure out why I’m getting 600MB/s instead of, say, 4000-8000MB/s, and why my latency is 5 times higher than what it could be.

Ultimately, the plan is to share a Jenkins workspace between two servers for code compile. Possibly use distcc as well in the future for distributed compilation.

EniGmA1987 · April 27, 2023, 3:26pm

What are the drive configurations and how are they connected on both the server and client?

dhodgson · April 28, 2023, 8:08am

I was testing with a ramdisk (tmpfs). a ZFS dataset (mirrored SSD’s) provided the same 600MB/s peak throughput testing with dd if=/dev/zero

But the Dell T420 host has dual SAS 12G SSDs (samsung, don’t remember the model #), and the supermicro client has dual intel 1TB 670p’s as well as dual 118GB P1600X’s (Optane). We may move the data around and switch client/host roles.

All are ZFS mirrors. I don’t remember the exact performance numbers but they’re all in excess of 10GbE (1250MB/s) speeds. More importantly, the native local latency on the optane is quite a bit lower than what I’m getting on the ramdisk over NFS w/ rdma.
Which makes me think that maybe NFS isn’t actually leveraging RDMA in my setup. Or there’s a bottleneck due to the default configuration (whether that’s the CX-4 cards’ config or something else, I have no idea).

Most of the documentation I’ve come across for this is spotty at best, skipping steps and leaving out details. Even the stuff from Red Hat and Mellanox/Nvidia themselves is hard to follow at times.

EniGmA1987 · April 28, 2023, 3:38pm

It sounds like you should be able to get over 1GB/s in testing then.
Perhaps try turning on Jumbo Frames between the devices and see what it does to throughput? Im not really sure why the NFS isnt getting better performance, as it really should and sounds like you set it up correctly. The performance you are getting seems more like Samba speeds than NFS to me. lol