How Can I Help with the new TRUENAS / 100G testing?

Ftfy

Lol oops

My machine is a couple generations old and we seem to be at limited to 8GB/s range, glad to hear that scaled up to 20GB/s with newer HW. And yeah, in reality that’s pretty darn good… I just a have a zealous streak about this stuff.

Thing is in a real NIC or storage device you have no memcpy() the device always DMAs. From my experience DMA was not just more efficient, it had significantly higher bandwidth. Sure for 100GBit links 20GB/s sounds good but compared to 400GBit devices? Less so… I don’t have data on newer architectures but I’d be surprised if max DMA memory BW on a system isn’t still quite a bit faster than max SW memory BW. Several generations ago I recall it was 2x higher in an experiment I ran in the lab.

For storage, DMA is easy to virtualize effectively because you always have a storage device at the end of the chain that can DMA. And the fundamentals of storage make it easy enough for the guest to hand a pointer to the virtio driver which the host can direct the device to DMA to.

Virtio should be able to do the same pointer trick with RDMA networking to the host based NIC, without needing to do PCIe passthrough or SRIOV stuff.

Guest to Guest and Guest to Host networking is more problematic, if I have this right. In those scenarios there isn’t an obvious DMA device so a memcpy() will have to do the lifting the DMA/RDMA would normally handle. I’m not as up to speed with the networking side, but I don’t see an easy way trick a standard networking stack because I assume both Guests in a G2G link would expect to have a copy of the data. (with storage the other copy of the data in on disk, so only one copy needs to be in RAM)

I think, async_memcpy() could fix this, enable the virtio_net driver to have the DMA magic. But I’m not up to speed with the network subsystem in Linux so I haven’t been able to figure out where to apply it. Anybody out there know /drivers/net code?

Bunch of tests that might be worthwhile would be using NVMe-oF to serve ZVOL block devices to Windows clients using the Starwind NVMe-oF initiator. Unlike iSCSI, this actually lets you use RDMA, too. I am using it since recently on my NAS and desktop, and seems to work just fine. According to Diskmark, it cerainly doesn’t need to hide behind my local 970 EVO, so long the data comes from ARC.

The Linux kernel has a module (nvmet) that acts as NVMe-oF target and works with any block device via the SCSI layer.

2 Likes

Ohhhh that’s really clever!

I’d love it if it was possible to do dataset over fabric though. I don’t like the idea of terabytes of data being locked inside a volume block device.

I suppose i could loopback mount it from either os, one at a time, in case of issues

For that case, we should pray that Samba finallyyyyyy gets their RDMA implementation stable and out to the public (I’m waiting on it since forever). The CIFS kernel implementation in Linux can do RDMA already and seems to be able to shovel quite some insane amounts of data over the link that way (so long you have the Pro version of Windows at the other end). But since TrueNAS is built with Samba in mind, that factoid doesn’t help so far.

NFS can do RDMA already, for TrueNAS <-> Linux/BSD.

Excellent post, it seems that smb does not seem to be able to make RDMA connections to windows. I tried samba and ksmbd separately both did not work.

Server: Kernel: 5.15.35-1-pve
Client: Windows Server 2019

The most intuitive way to show this is to go to task manager and see if the NIC is getting traffic, if there is no heavy traffic that means it is going RDMA. Or there is an RDMA option in the performance monitor.

Also can you please advise on the nfsd settings, I’m having some difficulties.

The error is reported as follows

root@pve03:~# echo rdma 20049 > /proc/fs/nfsd/portlist
-bash: echo: write error: Cannot allocate memory

It seems like I need to switch to another Linux distro?

Of course I can also provide my test data

Server:

OS: Windows Server 2019
CPU: AMD EPYC 7452 (32)
Motherboard: Supermicro H12SSL

Clients:

OS: Windows Server 2019

Sequential reads: 27GB/s
CPU load: ~96%

Extrapolating by looking at the disk read, the server is not involved in the file read.

1 Like

Hi @MistyMoon and welcome to the forum!

While I currently only have experience with older 40 GbE stuff I’m lurking in here out of curiosity.

Do you really mean 27 GIgabytes (GB) per second or Gigabit (Gb)? What is the used ethernet adapter specifically?

1 Like

I’m getting a little more curious about 100+ GbE after getting a PCIe Gen4 x16 NVMe PCIe Switch HBA to seemingly work properly.

I currently use Intel XL710-QDA2 Dual 40 GbE ethernet adapters, one client system and one server is connected to the two 40 GbE ports of a Mikrotik CRS326-24S+2Q+RM.

I’d like to connect one port of a NVIDIA Mellanox ConnectX-6 200G to the 40 GbE port of the mentioned switch and a 200 Gb direct connection to another NVIDIA Mellanox ConnectX-6.

Is such a mixed operation possible?

Gigabytes(GB) per second

To be precise, this is the speed of six Mellanox CX-3 Pro 40GbE NICs running simultaneously, each connected at pcie3.0 * 8 speed and with 56G links enabled. Thanks to the 128 pcie of epyc.

PS: Window SMB RDMA supports multiple lines.

Of course I’m going to upgrade to Mellanox CX-4 or CX-5 100G NIC, but it’s too expensive for me at the moment.

In my experience it seems best to have NICs from the same manufacturer on both sides, e.g. Mellanox on both sides.

Data is best not to go through the south bridge on the motherboard, meaning that both the NVME SSD and the NIC hang on the PCIE lane out of the CPU. Otherwise RDMA interconnections can cause some strange problems.

2 Likes

Nice, I thought so (GB/s not Gb/s) since of course 27 Gb/s is nothing to write home about :wink:

To use multiple ethernet connections under Windows with a single file transfer you don’t even need RDMA; SMB Multichannel works independently of it which is nice since only Windows Server and Windows for Workstation support RDMA.

That being said, I never got SMB MC to work reliably on any of my systems ever since Windows 8 when using non-Server Windows. With the exact same hardware, BIOS, drivers and user-exposed ethernet settings just booting up Windows Server 2012/2016/2019 (literally just changed the boot SSD) and SMB MC worked immediately.

I was able to get ksmbd working with SMB Direct/RDMA to a Windows 10 Pro for Workstations client:

$ Get-SmbMultiChannelConnection

Server Name    Selected Client IP     Server IP     Client Interface Index Server Interface Index Client RSS Capable Client RDMA Capable
-----------    -------- ---------     ---------     ---------------------- ---------------------- ------------------ -------------------
192.168.100.1  True     192.168.100.2 192.168.100.1 14                     20                     False              True

But it is not consistent. Sometimes it works well - no CPU overhead, no network bandwidth reporting in Task manager, RDMA stats captured in perfmon. Then the next file after that might fail with the following error in Event viewer … / SMBClient / Connectivity:

Event ID: 30804

A network connection was disconnected.

Instance name: \Device\LanmanRedirector
Server name: \192.168.100.1
Server address: 192.168.100.1:445
Connection type: Rdma
InterfaceId: 14

Guidance:
This indicates that the client’s connection to the server was disconnected.

Frequent, unexpected disconnects when using an RDMA over Converged Ethernet (RoCE) adapter may indicate a network misconfiguration. RoCE requires Priority Flow Control (PFC) to be configured for every host, switch and router on the RoCE network. Failure to properly configure PFC will cause packet loss, frequent disconnects and poor performance.

I tried to use the pwsh commands like Get-NetQosFlowControl but they seem to be Windows Server-specific. I don’t have them on the Pro for Workstations edition. I’ve also seen blog posts on the web saying PFC is not strictly required. My “network” is point-to-point, so there’s not going to be any external congestion.

I’m seeing this in dmesg, after enabling ksmbd.control -d rdma:

[21795.563777] ksmbd: smb_direct: read/write error. opcode = 531700601, status = WR flushed(5)
[21795.563788] ksmbd: smb_direct: read/write error. opcode = -1019489404, status = WR flushed(5)
[21795.563793] ksmbd: smb_direct: read/write error. opcode = 1605849338, status = WR flushed(5)
[21795.563796] ksmbd: smb_direct: read/write error. opcode = 0, status = WR flushed(5)
[21795.563959] ksmbd: smb_direct: Send error. status=‘WR flushed (5)’, opcode=0
[21795.563969] ksmbd: Failed to send message: -107
[21795.563986] ksmbd: Failed to send message: -107
[21795.563991] ksmbd: Failed to send message: -107

But I’m not 100% sure it’s related.

My setup is:

  • TR4 1950X Linux 6.4.12
  • AM5 7950X3D Windows 10 Pro for Workstations
  • Connectx3 on each side connected directly with a 3 meter DAC.
    • I tried FDR IB but it didn’t qualify for SMB Direct RSS nor RDMA, I think because of rxqueues = txqueues = 1. Not sure if it’s a limitation of IPoIB or if I didn’t configure it right.

So far, I’m bottlenecking the CX3 by using it on an x4 slot. It’s capping out at around 25Gbps, which makes sense after accounting for PCIe overhead. I was able to get the same speeds with Samba multichannel, but hoping for less overhead, latency and a higher peak bandwidth with RDMA once I have the NIC in an x8 slot. It’s a dual port NIC as well, so I hope to go beyond 40GbE in the future.

Any suggestions how to diagnose this further would be welcome, I’ve spent two days trying to get to this point and sometimes it gets really silly, like Windows not enabling Multichannel if you use guest accounts.

Hi, can anyone of you who at least successfully enabled multi-channel between a Windows client and a ksmbd server share some details about how you achieved that? I was trying to achieve SMB direct from Windows to Linux but could never get the multi-channel to work in the first place.

My client is Windows 10 Pro N for Workstations 22H2, my server is ksmbd on Arch Linux 6.5.6-zen2-1-zen kernel. Two hosts are connected together w/ 1m DAC using a pair of 100Gb single port CX4 VPI in IB mode.

I have this in my ksmbd.conf:

server multi channel support = yes

And on Windows:

PS C:\Windows\system32>  Get-SmbClientConfiguration | Select EnableMultiChannel

EnableMultiChannel
------------------
              True

Then I tried to post some transfers by running:

diskspd.exe -b8M -c20G -d60 -L -o8 -Sr -t16 -W10 -v \\192.168.109.1\Tmp\iotest.bin

But when the transfer is running and I run Get-SmbMultichannelConnection from the Windows side, I cannot get any output there.

This seems not to be an issue with any of you, am I missing something dumb like an extra ksmbd.conf entry or something?

Eventually, I jumped into the rabbit hole and resolved all of the issues I had with SMB Direct between a Windows 10 client and a Linux ksmbd server and I am currently able to run at my theoretically full speed between them without any noticeable CPU load. I am just going to leave a few tips and comments here in case any lost soul stomped upon the same problem.

  1. You need to use Windows 10 Pro for Workstations. Windows 10 is not enough. Even though some sources from Microsoft do say that they are supported, they are actually not. You can do every configuration without an explicit error, just it won’t work in the final phase.

  2. You cannot do it with insecure connections. Windows implicitly disables multi-channel if you set AllowInsecureGuestAuth. And multi-channel SMB is a requirement for SMB direct. (kudos to @farnoy for mentioning this! but I was blind enough to overlook that before my last post)

  3. Currently, in ksmbd, I can confirm that there is a bug that will make the server send out the wrong RDMA-capable flags if the interface is a native Infiniband device working in IPoIB mode. It will only correctly behave if the interface is a RoCE interface. This is probably why the OPs of this thread need a dirty hack patch to make it work (kudos to @Jared_Hulbert for inspiring me to look into this!)

Eventually, I was able to fix the bug in 3) myself and submitted a patch both to the ksmbd github repo and the linux mainline ([PATCH v3] ksmbd: fix missing RDMA-capable flag for IPoIB device in ksmbd_rdma_capable_netdev()). The patch is accepted and is on its way into the mainline, probably can be merged for 6.6 or 6.6.1. Meanwhile, I am using the github version of ksmbd reliably.

It’s been a longer journey than I expected, but at the end of day I got what I want so I am pretty happy about it.

4 Likes

Any pointer what I might be missing in my setup:

  • Voidlinux (server): 2 x Mellanox ConnectX-4
  • Win 10 Pro for Workstation (client): 1 x Mellanox ConnectX-4
  • Voidlinux (client): Mellanox ConnectX-4

Both clients are directly connected to server.
Linux to linux NFS works nicely with RDMA.
Linux to Win SMB doesn’t use RDMA (trafic can be seen in network monitor).

Server config:

-bash-5.2# cat /etc/ksmbd/ksmbd.conf
; see ksmbd.conf(5) for details
[global]
	netbios name = nas
	server multi channel support = yes
	bind interfaces only = yes
	interfaces = ib1

[Games]
	; share parameters
	;force group = app
	;force user = app
	path = /mnt/pool-0/games
	read only = no
	valid users = app
	writeable = yes
	guest ok = no
-bash-5.2# ksmbd.mountd --version
[ksmbd.mountd/59965]: INFO: ksmbd-tools version : 3.5.1
-bash-5.2# ibstat
CA 'ibp97s0'
	CA type: MT4115
	Number of ports: 1
	Firmware version: 12.28.2006
	Hardware version: 0
	Node GUID: 0xec0d9a0300c29e3c
	System image GUID: 0xec0d9a0300c29e3c
	Port 1:
		State: Down
		Physical state: Polling
		Rate: 10
		Base lid: 2
		LMC: 0
		SM lid: 2
		Capability mask: 0x2651e84a
		Port GUID: 0xec0d9a0300c29e3c
		Link layer: InfiniBand
CA 'ibp98s0'
	CA type: MT4115
	Number of ports: 1
	Firmware version: 12.28.2006
	Hardware version: 0
	Node GUID: 0xec0d9a0300c03974
	System image GUID: 0xec0d9a0300c03974
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 100
		Base lid: 5
		LMC: 0
		SM lid: 5
		Capability mask: 0x2651e84a
		Port GUID: 0xec0d9a0300c03974
		Link layer: InfiniBand
-bash-5.2# ip a show dev ib1
5: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast state UP group default qlen 256
    link/infiniband 80:00:01:06:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:c0:39:74 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 192.168.255.1/24 brd 192.168.255.255 scope global ib1
       valid_lft forever preferred_lft forever
    inet6 fe80::1e89:41fe:9fa2:ddf4/64 scope link 
       valid_lft forever preferred_lft forever

Win 10 Pro for Workstation:

PS C:\Windows\system32> Get-SmbClientNetworkInterface

Interface Index RSS Capable RDMA Capable Speed    IpAddresses                                Friendly Name
--------------- ----------- ------------ -----    -----------                                -------------
22              False       False        0  bps   {fe80::6e09:1a25:a20:766e}                 Ethernet 5
20              True        True         100 Gbps {fe80::a399:d970:a616:eef6, 192.168.255.2} Ethernet 9
19              False       False        2.5 Gbps {fe80::f74a:7297:4f4:a9db}                 Ethernet 8
8               False       False        1 Gbps   {fe80::f99f:bf10:8e6a:8b5b, 172.16.0.14}   Ethernet 7
16              False       False        3 Mbps   {fe80::cc85:eb01:789:dc35}                 Bluetooth Network Connection 2
Server Name   Selected Client IP     Server IP  Client Interface Index Server Interface Index Client RSS Capable Client RDMA Capable
-----------   -------- ---------     ---------  ---------------------- ---------------------- ------------------ -------------------
192.168.255.1 True     192.168.255.2 172.16.0.6 20                     2                      True               False

You are the reason I have this working. Thank you so much for this patch. I custom compiled 6.7RC version on AlmaLinux and have confirmed this has been working for the past few weeks.

I am receiving random disconnects on Windows 11 Pro - Workstation sometimes, but not enough to duplicate. I am also using an SRIOV virtual function on Mellanox ConnectXPro-4. It’s tolerable at this point.

Thanks!

Need kernel version. Make sure you are using at least 6.7RC.

I compiled custom kernel with ksmbd and smb direct enabled. With 100G ConnectX-4 cards windows didn’t pick up the SMB Direct but then I decided to test with 56G ConnectX-3 cards just for the sake of it and SMB Direct show up as it should.

After using it 24/7 for ~3 weeks the connection crashed every night and required reboot of the server to work again. I wasn’t able to detect what the actual issue was and switched back to samba.

1 Like

I haven’t had this same issue. Some kernel versions would help as the fix @Heppu coded may not be in there.