How Can I Help with the new TRUENAS / 100G testing?

missed the ksmbd-tools stuff, thanks.

Got it installed and It seems to work on the server side:

❯sudo ksmbd.control -d all
[ksmbd.control/124284]: INFO:  auth vfs oplock ipc conn rdma

❯cat /sys/class/ksmbd-control/debug
smb auth vfs oplock ipc conn rdma

I was trying to test with PC booted in linux, but no luck mounting with the RDMA option:

FAILS:  sudo mount -t cifs  //server/temp temp -o vers=3.1.1,rdma
WORKS:  sudo mount -t cifs  //server/temp temp -o vers=3.1.1

I’ll test Windows soon using IOZone or similar. Also, will update the previous post with Stats leveraging SAMBA Multichannel and create a similar post for windows 10 stats (exc. NFS)

2 Likes

I booted the desktop into windows 10 (21H1 build 190493.1348) and verified that windows saw the adapters as RDMA enabled

Get-NetAdapterRDMA

Name                      InterfaceDescription                     Enabled     PFC        ETS
----                      --------------------                     -------     ---        ---
100G_1                    Mellanox ConnectX-5 Ex Adapter           True        False      False
100G_2                    Mellanox ConnectX-5 Ex Adapter #2        True        False      False

I then ran the following against KSMBD:

diskspd.exe -b8M -c20G -d60 -L -o8 -Sr -t16 -W10 -v u:\iotest.dat

which resulted in approx. 32Gbit of traffic

I then enabled all KSMBD features:

sudo ksmbd.control -s

sudo ksmbd.control -d all
[ksmbd.control/700357]: INFO: b] [auth] [vfs] [oplock] [ipc] [conn] [rdma]

cat /sys/class/ksmbd-control/debug
[smb] [auth] [vfs] [oplock] [ipc] [conn] [rdma]

sudo ksmbd.mountd

Same Results and did not see RDMA used, so to ensure I’m reading the cryptic bracket [ ] system correctly, I ran it again after flipping the features:

sudo ksmbd.control -s
sudo ksmbd.control -d all
[ksmbd.control/705724]: INFO:  auth vfs oplock ipc conn rdma

cat /sys/class/ksmbd-control/debug
smb auth vfs oplock ipc conn rdma

sudo ksmbd.mountd

Once again same result.

I’ll test Samba and Multichannel later this weekend,
Brian

1 Like

Can you try to enable all the ksmbd debug prints and then do the following operations followed by recording the output of ‘sudo dmesg -c > mount_rdma.txt’ then share the three output files here?

  • mount client
  • read from file
  • write to file
1 Like

“did not see RDMA used” - how so? Just the speed or something else?

32Gbit with -w10, what about -w100 and -w0? Just to compare with the Linux?
I’m assuming they’re pretty close to the Linux client numbers.

What’s the reasoning with -Sr? I’m assuming thats a don’t care in this condition? Did you try other -S settings?

1 Like

What’s the reasoning with -Sr? what about -w100 and -w0?

I added a warmup period of 10 seconds (-W10 - Capital W, not lowercase w) and disabled local caching on the client (-Sr) to ensure all the reported IO traversed the network. FYI: with no -w (lower case), the default is 100% reads.

REF: Command line and parameters · microsoft/diskspd Wiki · GitHub

compare with the Linux?

As for speed, the 32Gbit is better than the linux cifs mount (32Gbit vs 25.4Gbit), but I assume the IO profile is different from bonnie++ to diskspd. I was not capturing the same level of detail as with the linux results at this time, but wanted to provide a quick update; I’ll post more detailed result when I have the time to duplicate the tests.

“did not see RDMA used” - how so?

When using RDMA, the server does not reflect the network IO in the network counter within htop, as the data does not follow the typical stack.
Ex: When doing 70Gbit+ over RDMA (NFS), the network counters show <1mbit of traffic (ssh session, other misc traffic). FYI: This is also why I’m using switch statistics to measure traffic to get consistent values.

Also, not sure if it matters as its supposed to be automatic, but I’m just mounting the share with a windows “NET USE…” command (yes, I’m dating myself a little with that one, but prefer the cli)

As for upload, apparently I’m a ‘new user’ still, so see:
dmesg KSMBD - Pastebin.com

May have found something, but haven’t had a chance to mess with it yet:

Looking at the windows event log:

Event Details
LogName     : Microsoft-Windows-SmbClient/Connectivity
Id          : 30822
TimeCreated : 12/5/2021 3:06:31 PM
Level       : 4
Message     : Failed to establish an SMB multichannel network connection.

              Error: The transport connection attempt was refused by the remote system.

              Server name: nas.domain.com
              Server address: 10.0.1.100:445
              Client address: 10.0.1.117
              Instance name: \Device\LanmanRedirector
              Connection type: Wsk

              Guidance:
              This indicates a problem with the underlying network or transport, such as with TCP/IP, and not with SMB. A firewall that blocks TCP port 445, or TCP port 5445 when
              using an iWARP RDMA adapter can also cause this issue. Since the error occurred while trying to connect extra channels, it will not result in an application error.
              This event is for diagnostics only.

looking at the server (linux), it appears that I need to enable PFC / ETS:

update: Wrong, not required

Also: on the Dell OS10 switch

  • The S5148F is a great value at ~$1200 - $1400 US for a 48x25G + 6x600G switch, but it can’t run most open network OSs like Sonic, etc. What I’ve come up with so far but no luck with SMB RDMA… Does RDMA with SMB work over a 802.3ad LAGGs? (see ports 1/1/51-1/1/52 below)
switch config snippets
snippet
...
class-map type network-qos nqosmap_rdma
 match qos-group 3
!
policy-map type application policy-iscsi
!
policy-map type network-qos p_nqos_rdma
 !
 class nqosmap_rdma
  pause
  pfc-cos 3
!
system qos
 trust-map dot1p default
!
...
!
interface port-channel3
 description nas_10.0.1.100
 no shutdown
 switchport access vlan 1
 mtu 9216
 spanning-tree port type edge
!
...
!
interface ethernet1/1/49
 description Desktop
 no shutdown
 switchport access vlan 1
 mtu 9216
 flowcontrol receive off
 flowcontrol transmit off
 priority-flow-control mode on
 service-policy input type network-qos p_nqos_rdma
 spanning-tree port type edge
!
interface ethernet1/1/50
 description Desktop
 no shutdown
 switchport access vlan 1
 mtu 9216
 flowcontrol receive off
 flowcontrol transmit off
 priority-flow-control mode on
 service-policy input type network-qos p_nqos_rdma
 spanning-tree port type edge
!
interface ethernet1/1/51
 description nas_10.0.1.100
 no shutdown
 channel-group 3 mode active
 no switchport
 mtu 9216
 flowcontrol receive off
 flowcontrol transmit off
 priority-flow-control mode on
 service-policy input type network-qos p_nqos_rdma
!
interface ethernet1/1/52
 description nas_10.0.1.100
 no shutdown
 channel-group 3 mode active
 no switchport
 mtu 9216
 flowcontrol receive off
 flowcontrol transmit off
 priority-flow-control mode on
 service-policy input type network-qos p_nqos_rdma
!

I feel like I’m asking more questions than providing assistance (new to DCB in general), but once we get over this hurdle, hopefully I can validate findings or run tests, etc.

1 Like

I haven’t done much in this sphere since 56Gbit IB was a big deal, so take my advice with tablespoon of salt. But at first pass my gut says PFC is big deal for optimizing performance sometimes but shouldn’t make or break your ability to make the RDMA connection.

I think the big clue here is that the multichannel is failing. The ksmbd docs make mention of multichannel being a requirement for RDMA. But that both features are partially supported. I don’t understand why that should be the case at the protocol level, but it’s possible that’s an implementation quirk of the Windows client code.

Perhaps if you can get the multichannel working, you get to the part where the RDMA fails!

What’s the hang up with the MLNX drivers? I never tried them with Arch, but in years past they weren’t that big of a deal.

I Got Multichannel working on windows 10:

Get-SmbMultichannelConnection

Server Name Selected Client IP  Server IP  Client Interface Index Server Interface Index Client RSS Capable Client RDMA Capable
----------- -------- ---------  ---------  ---------------------- ---------------------- ------------------ -------------------
nas         True     10.0.1.10  10.0.1.100 19                     2                      False              False
nas         True     10.0.1.117 10.0.1.100 18                     2                      False              False

looking further into RDMA, I’m now getting the following event logs:

Windows Events
LogName     : Microsoft-Windows-SmbClient/Connectivity
Id          : 30822
TimeCreated : 12/6/2021 1:34:49 PM
Level       : 4
Message     : Failed to establish an SMB multichannel network connection.

              Error: The transport connection attempt was refused by the remote system.

              Server name: nas
              Server address: 10.0.1.100:445
              Client address: 10.0.1.117
              Instance name: \Device\LanmanRedirector
              Connection type: Wsk

              Guidance:
              This indicates a problem with the underlying network or transport, such as with TCP/IP, and not with SMB. A firewall that blocks TCP port 445, or TCP port 5445 when
              using an iWARP RDMA adapter can also cause this issue. Since the error occurred while trying to connect extra channels, it will not result in an application error.
              This event is for diagnostics only.

LogName     : Microsoft-Windows-SmbClient/Connectivity
Id          : 30804
TimeCreated : 12/6/2021 1:34:47 PM
Level       : 2
Message     : A network connection was disconnected.

              Instance name: \Device\LanmanRedirector
              Server name: \nas
              Server address: 10.0.1.100:445
              Connection type: Wsk
              InterfaceId: 19

              Guidance:
              This indicates that the client's connection to the server was disconnected.

              Frequent, unexpected disconnects when using an RDMA over Converged Ethernet (RoCE) adapter may indicate a network misconfiguration. RoCE requires Priority Flow
              Control (PFC) to be configured for every host, switch and router on the RoCE network. Failure to properly configure PFC will cause packet loss, frequent disconnects
              and poor performance.

The server only reports that there is no IPv6, I assume that’s not a requirement?

[ 4569.921757] ksmbd: Can't create socket for ipv6, try ipv4: -97
[ 4569.922726] ksmbd: Can't create socket for ipv6, try ipv4: -97
...
[ 4569.928955] ksmbd: smb_direct: init RDMA listener. cm_id=000000003af95ded

MLNX drivers not a requirement, but it makes setting IRQ affinity easy in NUMA systems, eases FW updates and setting options.

Thanks for the update.

The log you linked, what was the workflow you captured? Approximate timing would be nice. Example: At time 0 I mounted the fs, ~10s later I read, ~20s after reading I write a file.

I’ll generate a new one, hold on.

Also, I find it odd that windows reports that the ‘client RDMA capable’ is false as shown above in the Get-SmbMultichannelConnection output, but Get-NetAdapterRDMA reports it as available?

Get-NetAdapterRDMA

Name                      InterfaceDescription                     Enabled     PFC        ETS
----                      --------------------                     -------     ---        ---
100G_1                    Mellanox ConnectX-5 Ex Adapter           True        False      False
100G_2                    Mellanox ConnectX-5 Ex Adapter #2        True        False      False
1 Like

dmsg.txt (64.8 KB)

  • 6565.632221: Start ksmbd
  • 6589.458732: Uploaded upload.txt
  • 6591.818867: Downloaded dmsg.txt (old one)
  • 6598.285087: stop ksmbd

dumb question: should I have an open port 5445, because I don’t?

Update: Answer – Nope, not needed.

I saw ZFS and “performance” mentioned together… figured I’d bring up for those that aren’t aware yet, when 3.0 comes out it should have DirectIO which can potentially resolve some bottlenecks utilizing NVMe drives, though it’s mostly writes that benefit the most.

1 Like

Okay so the “Can’t create socket” message seems to be benign. IF it was follow by a similar warning about IPv4 then you’d have a problem.
transport_tcp.c - fs/ksmbd/transport_tcp.c - Linux source code (v5.15.4) - Bootlin

But note that’s in setting up the tcp transport. The message about “ksmbd: smb_direct: init RDMA listener.” That only seems to happen when RDMA on the server is satisfactorily set up. There is nothing in the log that seems to indicate the server is the primary problem. In fact the server is rather passive in the matter. I don’t know the protocol well but the code suggest there are command blocks (maybe called PDUs) that are used to communicate between client and server. So following the smb2pdu.c smb2_read() path there is a flag check that determines whether or not to attempt the RDMA transfer.
https://elixir.bootlin.com/linux/v5.15.4/source/fs/ksmbd/smb2pdu.c#L6204

Now this ‘Channel’ flag doesn’t appear to be set anywhere in the ksmbd code but it is set in the cifs client codebase.
https://elixir.bootlin.com/linux/v5.15.4/source/fs/cifs/smb2pdu.c#L3941

However that does require the kernel has CIFS_SMB_DIRECT enabled at build. Can you grep through your .config for this kernel for SMB_DIRECT?

Something like the following

jared@pop-os:/data/workspaces/linux/linux-torvalds$ grep SMB_DIRECT /boot/config-5.11.0-7620-generic
# CONFIG_CIFS_SMB_DIRECT is not set

or

cat  /proc/config | grep SMB_DIRECT

or

zcat /proc/config.gz | grep SMB_DIRECT

If it’s not set… you could try a rebuilt kernel on the client side.

As an Arch user I assume you aren’t afraid of a little kernel rebuilding? If not I can help.

1 Like

The testing I was performing was with a windows 10 client, I’ll reboot into arch shortly and try there as well ac check the flags.

CIFS: VFS: CONFIG_CIFS_SMB_DIRECT is not enabled

Need to compile a kernel… stay tuned.

================================

UPDATE

Son of a #$%# :face_with_symbols_over_mouth:, it works with the linux cifs client!!!

uname -a
Linux desktop 5.15.6-arch2-1-smbdirect #1 SMP PREEMPT Mon, 06 Dec 2021 20:59:16 +0000 x86_64 GNU/Linux
sudo mount -t cifs  //server/temp temp -o vers=3.1.1,rdma (works)

I Updated the results in the earlier thread:

KSMBD (RDMA):
      -- Write to Server:   39502 Mbits/sec, 1191122 packets/sec, 39% of line rate
      -- Read From Server:  41601 Mbits/sec, 1255649 packets/sec, 41% of line rate

Not sure why the windows 10 client (same hardware) is failing

thanks again,

1 Like

Fantastic!

I do think your thoughts on the Get-SmbMultichannelConnection vs Get-NetAdapterRDMA is the issue. So it would seem that Get-SmbMultichannelConnection is reporting data from the established SMB link to your ksmbd server.

I think RDMA is likely enabled for the NIC on windows but when it mounts the SMB volume it decides the link is not eligible for RDMA.

If that’s correct then I think the question is why does the Windows client make that decision while the Linux client works?

But I guess you can try the pass-through filesystem to a windows VM? Wasn’t that one of the things @wendell suggested?

1 Like

Yes, another test for another day…

I may need to run the VM on the desktop vs server, but if we’re just worried about I/O throughput / latency, the Desktop is plenty fast :smiley:

FIO 
------------------
READ:  bw=17.8GiB/s (19.1GB/s), 17.8GiB/s-17.8GiB/s (19.1GB/s-19.1GB/s), io=10.0GiB (10.7GB), run=562-562msec
WRITE: bw=9679MiB/s (10.1GB/s), 9679MiB/s-9679MiB/s (10.1GB/s-10.1GB/s), io=10.0GiB (10.7GB), run=1058-1058msec

Thanks again

1 Like

Wow. That’s pretty much line speed right? FYI - If you get the settings right I used to get > line speed with some RW workloads because the link is duplexed. Not 2x but I don’t remember if it was 10% or 50% or what.

whats the cpu utilization like during this? mine was ~8%