TrueNAS Scale 24.10 - Very slow SMB performance Windows 11 client

This has been driving me nuts this week. My Terramaster F4-424 Max is 2x faster when reading from SMB share (Raid1 with NVMe cache) than TrueNAS Scale 24.10.2 Raid 1 NVMe dataset.

I get 1.15gb/s when coping from Terramaster SMB share and only get 470 mb/s from TrueNAS.

My setup is

VM TrueNAS Scale 24.10.2 in Proxmox (q35, VirtIO SCSI single)
8 cores of Epyc 7402
96GB RAM to TrueNAS
2 x NVMe PM983 PCIe passthrough
1 x Mellanox ConnectX 4-Lx 25gbpe PCIe passthrough

I have ZFS raid 1 mirror with 2 x PM983 exported as SMB Multichannel share

TrueNAS runs very modern Samba 4.20. It supports SMB multichannel and io_ring.

It’s perfectly capable to determine interface speed and RSS support, but I also tried explicit via cli

interfaces = "x.x.x.x;capabilities=RSS,speed=25....lots of zeros" via service smb update smb_options= with subsequent smbd restart

also tried (altho as I read, they are deprecated settings in modern Samba)

aio read size = 1 or 16 * 1024
use sendfile = yes

Raid1 NVMe is fast, fio for sequential reads reports this :

$ fio --name=fio_test --ioengine=libaio --iodepth=16 --direct=1 --thread --rw=write --size=100M --bs=4M --numjobs=1 --time_based --runtime=60

Run status group 0 (all jobs):
WRITE: bw=6334MiB/s (6642MB/s), 6334MiB/s-6334MiB/s (6642MB/s-6642MB/s), io=371GiB (399GB), run=60001-60001msec

Network - iperf3 - 10gbps easily

Accepted connection from 192.168.1.16, port 60898
[ 5] local 192.168.1.40 port 5201 connected to 192.168.1.16 port 60899
[ 8] local 192.168.1.40 port 5201 connected to 192.168.1.16 port 60900
[ 10] local 192.168.1.40 port 5201 connected to 192.168.1.16 port 60901
[ 12] local 192.168.1.40 port 5201 connected to 192.168.1.16 port 60902

[ 5] 0.00-1.00 sec 299 MBytes 2.51 Gbits/sec
[ 8] 0.00-1.00 sec 299 MBytes 2.51 Gbits/sec
[ 10] 0.00-1.00 sec 292 MBytes 2.45 Gbits/sec
[ 12] 0.00-1.00 sec 292 MBytes 2.45 Gbits/sec
[SUM] 0.00-1.00 sec 1.15 GBytes 9.91 Gbits/sec

Super basic setup as you can see, everything is fast and supposed to be fast for clients.

But it is not, the max I get copying files from SMB export share is 470mb/s :

image

Client - Windows 11 Pro Mellanox ConnectX 4-Lx

SMB multichannel - 100% enabled and used (I even wasted day looking at Wireshark negotiating packets)

PS C:\Windows\System32> Get-SmbConnection
ServerName ShareName UserName Credential Dialect NumOpens

---------- --------- -------- ---------- ------- --------
192.168.1.40 video_fast KORESH\admin KORESH\otec 3.1.1 2
PS C:\Windows\System32> Get-SmbMultichannelConnection
Server Name Selected Client IP Server IP Client Interface Index Server Interface Index Client RSS Capable Client RDMA Capable
----------- -------- --------- --------- ---------------------- ---------------------- ------------------ -------------------
192.168.1.40 True 192.168.1.16 192.168.1.40 6 2 True False

That’s it. I tried numerous sysctl changes - but no point as iperf3 shows perfect 10gbps speed.

I connect same Windows 11 client to Terramaster F4-424 Max (running 1x10gbps RJ45 Marvell AQtion) , I get immediate 1.15gb/s

Terramaster RAID1 X18 SATA disks + NVMe cache (so on paper its even worse)
It runs Samba 4.15 with same settings SMB multichannel settings.

image

I did try TrueNAS Scale version 22 and even nightly 25, no luck, same speed.

WHY TRUENAS IS SLOW ??

What’s the measured disk write speed on the windows client?

For a storage transfer across the network to be fast you need

  • a fast network,
  • a fast server, and
  • a fast client

It doesn’t matter in this case because when I measure writes from TrueNAS and TerraMaster I use same target disk and time difference between both measurements is seconds to few minutes. Its literally net use one share, copy, delete, net use second share, copy and repeat.

Hardware on Windows client is Samsung SSD 990 Pro or Sabrent Rocket 4.0 Plus, tried both

It does matter if it is read or write, since ZFS is a CoW filesystem and ext4 isn’t.

But assuming that you are talking about reads and not writes, it could very well be that the issue is the TrueNAS SCALE smb implementation. Currently there are some people in the TrueNAS forums that experience performance regressions when “upgrading” from CORE to SCALE.

I would try the TrueNAS forums in your situation.
Having TrueNAS in a VM does add multiple layers of complexity to hunt down the issue.
As a first step of troubleshooting, I would recommend running it bare metal on the same hardware.

Yes sorry, I’m talking about reads only.

Just to be super precise in setup :

  • TrueNAS runs with raid1 (mirror) of 2 x PM893 (no other vdevs attached)

  • TerraMaster runs Ubuntu with LVM raid 1 of 2 x Exos X18 SATA spin disks with NVMe read cache in front of them

I’m reading from both of them one after another from same Windows 11 client, onto the same client NVMe drive.

The results are 100% reproducible.

Interesting - I found this because I’m seeing something similar only considerably worse!

In my case, I think the culprit is Windows 11 itself - I’m seeing 300MBs-1 odd for reads from TrueNAS → Windows, but often dropping to a mere 50MBs-1, writes are about 10x worse!

Reading from the same SMB share from another Linux host, I see 800MBs-1 odd for reads, but even that is a bit disappointing as it’s a 3x stripe of PM9A3’s 2-way mirrors.

I can saturate 25GbE with NFS to that array - but actually I need to have a play with my setup to retest SMB to that array via Proxmox’s built in SMB server rather than TrueNAS…

I did more testing of Linux to TrueNAS and I get 1.15gb/s read speed easily. I posted how I measured this in above thread at truenas forums.

I don’t think culprit is Windows 11, as in my case SMB multichannel, RSS and MTU are setup correctly, NIC drivers are there. And I do get 1.15gb/s from Windows against TerraMaster (Ubuntu, samba 4.15).

I think at this stage, I need to do packet capture and see how Samba 4.20 sizes smb reads, but I don’t have time for this now.

I did more testing with TrueNAS + ksmbd and this looks much better. I get stable ~800mb/s read and 1.15gb/a writes. I posted a thread about it here

Just wanted to provide a single data point…

I currently use TrueNAS Scale 24.10.2 (bare metal) with Ryzen 5750GE, Asrock B550 ITX/AX, 8xP4610 RAIDZ1, 64GB RAM, Chelsio T6225-CR @ 10G.

Client is Win11 Pro for Workstation with Marvell AQC113CS.

I have no issues download/uploading from/to the NAS at ~1.05GB/s via SMB using default MTU (no jumbo frames).

However, prior to the Chelsio T6225, I was using a Marvell AQC107 in M.2 form factor for Cobia and Dragonfish. Client was Win10 Pro (not workstation). SMB speeds were inconsistent for both download and upload. It usually started out at 900MB/s but would drop to 450MB/s after a min. Not sure if it was due to overheating, drivers, or Win10.

I guess the point I’m trying to make here is I do not think there is an inherent issue in TrueNAS Scale 24.10.2 or Win11 Pro for Workstations.

I have seen this before. Frustrating when you have all the hardware but it just does not work as planned.

A few thoughts,

  • Have you enabled high performance power plan in Windows on the client side?
  • Does the performance change if you are using robocopy?
  • Do you have the latest PPM pack for Windows installed?
  • Do you have any different 10/25gbe cards to try that are not Marvel based with the atlantic driver?
  • Do you have any fan/airflow on your Mellanox card?

I ask because in my adventures, these have all played various roles. For the.longest time i was plagued by the infamous Windows copy bug that sometimes copies are fast other time poor.

CPU power management on Windows can make a difference. One of my machines had terrible.power manage issues in the BIOS and Windows making things super slow if i was not in performance mode all the time.

I found my Mellanox cards needed airflow. They might support 80 to 105C but they do not run well that warm. Good airflow can restore some perforance.

Marvel Aq based 10g devices are terrible. Nothing but driver issues with them on Linux. They have never been very performant at all. Cheap sure but flakey drivers in Windows and Linux was a big turn off for me.

I found Robocopy to be the true test of performance as the GUI was sometimss slow for no reason. Not really good for 1 file copies, but for bulk its pretty easy to use.

ZFS is a interesting beady as it wants to keep data safe over everything. Big sequential copies do not go through the ARC cache. Depending on the config you will get the perforance of the slowest dtive in each vdev. On raid0 style zfs vdevs its the aggregate of the drives balanced by recordsize.

On my slowest SMB pool, using Mellanox 4 cards, with a striped vdev of 2 sata drives I get 800 to 900MB/s on writes. Im.using Intel drives with strong sustained writes. I cannot speak much about the Samsung drives, but i can regularly robocopy 60 to 300GB at 800 to 900MB/s to that scratch volume using the tweaks above.

HTH

These are good inputs for troubleshooting by @wardtj.
Just some small corrections, so other don’t get confused:

No write goes through ARC.
ARC stands for Adaptive Replacement Cache.
It only caches reads, no writes!

This is not how it works. It is not dependent on sequential or not. It depends on sync or async.

Sync writes will not be “cached” in RAM. They will be written to ZIL (which could be on the pool or in a SLOG).

Async writes will be cached in TXG and then written sequentially into the pool.
So even incoming none sequential async writes, will become sequential.
TXG by default tries to be at 5 seconds.

Thanks for input but in my case these are all irrelevant.

As I mentioned I run same Windows 11 Pro client against TrueNAS SMB share and TerraMaster SMB share, one after another.

I get ~450mb/s max against TrueNAS and I immediately get 1.15gb/s against TerraMaster.

This is with a simple drag-and-drop single-threaded copy.

Ok, I finally figured out what it was causing the issue with slow reads and unfortunately its side effect of Proxmox.

TrueNAS VM was setup with QEMU x86-64-v2-AES CPU.

Switching this to host CPU fixes the problem of slow reads. I guess it adds support for missing CPU instructions that Samba clearly uses.

I’m back to 10gbps saturation with Samba 4.20 and TrueNAS SCALE 24.10.2

c071a176d763689e17d0f46a29f2c6c67e16526c

3 Likes

Good catch.

Stuff like that is why I run storage (TrueNAS) and firewall (OPNsense) bare metal.
Todays IT is complicated and fragile enough, no need to put an extra layer of complexity IMHO.

BTW, reason why Proxmox uses x86-64-v2-AES CPU by default is that most host support that.
That way you can setup a Cluster and move VMs around.

If you don’t use a cluster, I would always set the CPU to host.
That way the VMs get all your CPUs features.

1 Like

I fully agree with you and I learned it hard-way.

I tried to run a site-to-site Proxmox cluster where Opnsense is a VM on one side. That is next to impossible, as any change to VM hardware requires a reboot and hence link-loss between cluster nodes. I totally understand why routers should be bare-metal.

TrueNAS VM - i’m just being cheap. In order to get decent I/O the box runs Epyc 7402P and it’s so powerful to waste resources. NAS only really takes 8 cores. I know Docker an option but LXC containers are much nicer solution IMO.

1 Like