I was trying to test with PC booted in linux, but no luck mounting with the RDMA option:
FAILS: sudo mount -t cifs //server/temp temp -o vers=3.1.1,rdma
WORKS: sudo mount -t cifs //server/temp temp -o vers=3.1.1
I’ll test Windows soon using IOZone or similar. Also, will update the previous post with Stats leveraging SAMBA Multichannel and create a similar post for windows 10 stats (exc. NFS)
Can you try to enable all the ksmbd debug prints and then do the following operations followed by recording the output of ‘sudo dmesg -c > mount_rdma.txt’ then share the three output files here?
What’s the reasoning with -Sr? what about -w100 and -w0?
I added a warmup period of 10 seconds (-W10 - Capital W, not lowercase w) and disabled local caching on the client (-Sr) to ensure all the reported IO traversed the network. FYI: with no -w (lower case), the default is 100% reads.
As for speed, the 32Gbit is better than the linux cifs mount (32Gbit vs 25.4Gbit), but I assume the IO profile is different from bonnie++ to diskspd. I was not capturing the same level of detail as with the linux results at this time, but wanted to provide a quick update; I’ll post more detailed result when I have the time to duplicate the tests.
“did not see RDMA used” - how so?
When using RDMA, the server does not reflect the network IO in the network counter within htop, as the data does not follow the typical stack.
Ex: When doing 70Gbit+ over RDMA (NFS), the network counters show <1mbit of traffic (ssh session, other misc traffic). FYI: This is also why I’m using switch statistics to measure traffic to get consistent values.
Also, not sure if it matters as its supposed to be automatic, but I’m just mounting the share with a windows “NET USE…” command (yes, I’m dating myself a little with that one, but prefer the cli)
May have found something, but haven’t had a chance to mess with it yet:
Looking at the windows event log:
Event Details
LogName : Microsoft-Windows-SmbClient/Connectivity
Id : 30822
TimeCreated : 12/5/2021 3:06:31 PM
Level : 4
Message : Failed to establish an SMB multichannel network connection.
Error: The transport connection attempt was refused by the remote system.
Server name: nas.domain.com
Server address: 10.0.1.100:445
Client address: 10.0.1.117
Instance name: \Device\LanmanRedirector
Connection type: Wsk
Guidance:
This indicates a problem with the underlying network or transport, such as with TCP/IP, and not with SMB. A firewall that blocks TCP port 445, or TCP port 5445 when
using an iWARP RDMA adapter can also cause this issue. Since the error occurred while trying to connect extra channels, it will not result in an application error.
This event is for diagnostics only.
looking at the server (linux), it appears that I need to enable PFC / ETS:
update: Wrong, not required
Also: on the Dell OS10 switch
The S5148F is a great value at ~$1200 - $1400 US for a 48x25G + 6x600G switch, but it can’t run most open network OSs like Sonic, etc. What I’ve come up with so far but no luck with SMB RDMA… Does RDMA with SMB work over a 802.3ad LAGGs? (see ports 1/1/51-1/1/52 below)
switch config snippets
snippet
...
class-map type network-qos nqosmap_rdma
match qos-group 3
!
policy-map type application policy-iscsi
!
policy-map type network-qos p_nqos_rdma
!
class nqosmap_rdma
pause
pfc-cos 3
!
system qos
trust-map dot1p default
!
...
!
interface port-channel3
description nas_10.0.1.100
no shutdown
switchport access vlan 1
mtu 9216
spanning-tree port type edge
!
...
!
interface ethernet1/1/49
description Desktop
no shutdown
switchport access vlan 1
mtu 9216
flowcontrol receive off
flowcontrol transmit off
priority-flow-control mode on
service-policy input type network-qos p_nqos_rdma
spanning-tree port type edge
!
interface ethernet1/1/50
description Desktop
no shutdown
switchport access vlan 1
mtu 9216
flowcontrol receive off
flowcontrol transmit off
priority-flow-control mode on
service-policy input type network-qos p_nqos_rdma
spanning-tree port type edge
!
interface ethernet1/1/51
description nas_10.0.1.100
no shutdown
channel-group 3 mode active
no switchport
mtu 9216
flowcontrol receive off
flowcontrol transmit off
priority-flow-control mode on
service-policy input type network-qos p_nqos_rdma
!
interface ethernet1/1/52
description nas_10.0.1.100
no shutdown
channel-group 3 mode active
no switchport
mtu 9216
flowcontrol receive off
flowcontrol transmit off
priority-flow-control mode on
service-policy input type network-qos p_nqos_rdma
!
I feel like I’m asking more questions than providing assistance (new to DCB in general), but once we get over this hurdle, hopefully I can validate findings or run tests, etc.
I haven’t done much in this sphere since 56Gbit IB was a big deal, so take my advice with tablespoon of salt. But at first pass my gut says PFC is big deal for optimizing performance sometimes but shouldn’t make or break your ability to make the RDMA connection.
I think the big clue here is that the multichannel is failing. The ksmbd docs make mention of multichannel being a requirement for RDMA. But that both features are partially supported. I don’t understand why that should be the case at the protocol level, but it’s possible that’s an implementation quirk of the Windows client code.
Perhaps if you can get the multichannel working, you get to the part where the RDMA fails!
Get-SmbMultichannelConnection
Server Name Selected Client IP Server IP Client Interface Index Server Interface Index Client RSS Capable Client RDMA Capable
----------- -------- --------- --------- ---------------------- ---------------------- ------------------ -------------------
nas True 10.0.1.10 10.0.1.100 19 2 False False
nas True 10.0.1.117 10.0.1.100 18 2 False False
looking further into RDMA, I’m now getting the following event logs:
Windows Events
LogName : Microsoft-Windows-SmbClient/Connectivity
Id : 30822
TimeCreated : 12/6/2021 1:34:49 PM
Level : 4
Message : Failed to establish an SMB multichannel network connection.
Error: The transport connection attempt was refused by the remote system.
Server name: nas
Server address: 10.0.1.100:445
Client address: 10.0.1.117
Instance name: \Device\LanmanRedirector
Connection type: Wsk
Guidance:
This indicates a problem with the underlying network or transport, such as with TCP/IP, and not with SMB. A firewall that blocks TCP port 445, or TCP port 5445 when
using an iWARP RDMA adapter can also cause this issue. Since the error occurred while trying to connect extra channels, it will not result in an application error.
This event is for diagnostics only.
LogName : Microsoft-Windows-SmbClient/Connectivity
Id : 30804
TimeCreated : 12/6/2021 1:34:47 PM
Level : 2
Message : A network connection was disconnected.
Instance name: \Device\LanmanRedirector
Server name: \nas
Server address: 10.0.1.100:445
Connection type: Wsk
InterfaceId: 19
Guidance:
This indicates that the client's connection to the server was disconnected.
Frequent, unexpected disconnects when using an RDMA over Converged Ethernet (RoCE) adapter may indicate a network misconfiguration. RoCE requires Priority Flow
Control (PFC) to be configured for every host, switch and router on the RoCE network. Failure to properly configure PFC will cause packet loss, frequent disconnects
and poor performance.
The server only reports that there is no IPv6, I assume that’s not a requirement?
The log you linked, what was the workflow you captured? Approximate timing would be nice. Example: At time 0 I mounted the fs, ~10s later I read, ~20s after reading I write a file.
Also, I find it odd that windows reports that the ‘client RDMA capable’ is false as shown above in the Get-SmbMultichannelConnection output, but Get-NetAdapterRDMA reports it as available?
Get-NetAdapterRDMA
Name InterfaceDescription Enabled PFC ETS
---- -------------------- ------- --- ---
100G_1 Mellanox ConnectX-5 Ex Adapter True False False
100G_2 Mellanox ConnectX-5 Ex Adapter #2 True False False
I saw ZFS and “performance” mentioned together… figured I’d bring up for those that aren’t aware yet, when 3.0 comes out it should have DirectIO which can potentially resolve some bottlenecks utilizing NVMe drives, though it’s mostly writes that benefit the most.
But note that’s in setting up the tcp transport. The message about “ksmbd: smb_direct: init RDMA listener.” That only seems to happen when RDMA on the server is satisfactorily set up. There is nothing in the log that seems to indicate the server is the primary problem. In fact the server is rather passive in the matter. I don’t know the protocol well but the code suggest there are command blocks (maybe called PDUs) that are used to communicate between client and server. So following the smb2pdu.c smb2_read() path there is a flag check that determines whether or not to attempt the RDMA transfer. https://elixir.bootlin.com/linux/v5.15.4/source/fs/ksmbd/smb2pdu.c#L6204
The testing I was performing was with a windows 10 client, I’ll reboot into arch shortly and try there as well ac check the flags.
CIFS: VFS: CONFIG_CIFS_SMB_DIRECT is not enabled
Need to compile a kernel… stay tuned.
================================
UPDATE
Son of a #$%# , it works with the linux cifs client!!!
uname -a
Linux desktop 5.15.6-arch2-1-smbdirect #1 SMP PREEMPT Mon, 06 Dec 2021 20:59:16 +0000 x86_64 GNU/Linux
sudo mount -t cifs //server/temp temp -o vers=3.1.1,rdma (works)
I Updated the results in the earlier thread:
KSMBD (RDMA):
-- Write to Server: 39502 Mbits/sec, 1191122 packets/sec, 39% of line rate
-- Read From Server: 41601 Mbits/sec, 1255649 packets/sec, 41% of line rate
Not sure why the windows 10 client (same hardware) is failing
I do think your thoughts on the Get-SmbMultichannelConnection vs Get-NetAdapterRDMA is the issue. So it would seem that Get-SmbMultichannelConnection is reporting data from the established SMB link to your ksmbd server.
I think RDMA is likely enabled for the NIC on windows but when it mounts the SMB volume it decides the link is not eligible for RDMA.
If that’s correct then I think the question is why does the Windows client make that decision while the Linux client works?
But I guess you can try the pass-through filesystem to a windows VM? Wasn’t that one of the things @wendell suggested?
Wow. That’s pretty much line speed right? FYI - If you get the settings right I used to get > line speed with some RW workloads because the link is duplexed. Not 2x but I don’t remember if it was 10% or 50% or what.