Slow FreeNAS SMB read speeds (RAID-Z2)

Jared · October 15, 2019, 4:22pm

Hello there!

At the moment I am witnessing around half the expected read speeds over SMB (170-230MB/s) when taking my setup into account. I cannot for the life of me figure out where exactly my problem lies and need some help working this out. I did not post this thread in the “Networking” section of the Forums due to the results of my testing so far and additional suspicions I have. Bear in mind that this is ONLY about the slow read speeds from my FreeNAS box. Write speeds over SMB are basically maxing out the wd red raidz2 pool performance capability at 430-450MB/s. I know not to expect 1.1GB/s unless both ends have insanely fast NVMe drives.
Alright, here goes…let me try to give you as much background as possible.

FreeNAS PC specs:
Mobo: EVGA X58 FTW3
CPU: E5645 (6 cores, 12 threads)
Memory: 24GB DDR3 ECC 1600Mhz (6x4GB)
HBA: LSI SAS 9207-8i IT Mode LSI00301
HBA Cables: 2x Mini SAS Cable SFF 8087 to 4 SATA 7pin
10G NIC: Mellanox ConnectX-3 Single Port
10G Cable: 10Gtek 10Gb/s SFP+ Cable 1m Direct Attach Twinax Passive DAC
Boot Drive: Intel 40GB Sata 2
Data Drives: 6x3TB WD Red in Raid-Z2
OS: FreeNAS 11.2-U5

Main PC specs:
Mobo: MSI B350M Mortar
CPU: Ryzen 5 1600x
Memory: 16GB DDR4 3200Mhz (2x8GB)
10G NIC: Mellanox ConnectX-3 Single Port
10G Cable: 10Gtek 10Gb/s SFP+ Cable 5m Direct Attach Twinax Passive DAC
Boot & Data Drive: Samsung 850 pro 256GB
OS: Windows 10 1903 64bit

10G Network Switch:
MikroTik CRS305-1G-4S+IN

Detailed Phyiscal Disk Info:
6x3TB WD Red Drives:
Example drive:

Note that all the drives have the exact specs as in the screenshot above yet the device model numbers and firmware revisions do differ slightly.
2x Model WDC WD30EFRX-68AX9N0, Firmware 80.00A80
2x Model WDC WD30EFRX-68N32N0, Firmware 82.00A82
2x Model WDC WD30EFRX-68EUZN0, Firmware 82.00A82

Motherboard:
PCI-E Slot 1: HBA
PCI-E Slot 2: 10G NIC
PCI-E Slot 3: GPU

According to this section from the motherboard manual I should theoretically not run into any PCI-E bandwidth limitations:

FreeNAS SMB Settings:

The Auxiliary Parameters I have set helped fix a weird bug where opening a windows network share, that has a boat load of files/folders and/or big files (such as my movies folder), on a windows client in windows explorer would result in the folder taking ages to “load”.
The SMB service is solely bound to the IP address of the 10G NIC. The second FreeNAS NIC is used for a couple of jails and web gui access.

FreeNAS 10G NIC:
Options: mtu 9000

Main PC 10G NIC:
Jumbo Packet Value: 9014

Absolutely no idea what these different modes are for which can be found in the Mellanox ConnectX-3 Etherenet Adapter Properties. The descriptions for each mode use the words they are supposed to be describing (derp).

Mikrotik Switch:
Not 100% sure about the difference between “Actual MTU” and “L2 MTU” and if my settings are correct but the displayed relative RX and TX speeds during an iperf test lead me to believe this is not where my issue lies.

The only other settings I have changed for both sfp+ ports (one connected to main PC, one to FreeNAS PC) within the Mikrotik web gui is disabling auto negotiation, forcing 10G speed and enabling “full duplex”.

Testing (all disk speed tests were run on a dataset with compression turned OFF and where possible the file size was set to larger than 24GB due to RAM caching!):
iperf:

Sender (server) = FreeNAS

Receiver (client) = FreeNAS

Results seem solid enough to me.

Example for CPU usage during a CrystalDiskMark test:

No CPU bottlenecking especially because drive encryption is not enabled.

DD tests:
I used block size = 4096k and count = 10000. This should result in a 40GB file being used?

using “zdb -U /data/zfs/zpool.cache MyPoolHere | grep ashift” gave me an ashift value of “12” which should correspond with the 4k physical sector size of my drives?

Questions:
1.) Is a single drive starting to fail/slow down, which in turn causes the entire pool to slow down? All S.M.A.R.T. and scrub tests, which I have setup to run periodically always return without any errors. I do not know of a way to test single drive performance if it is in a pool and you do not want to destroy said pool.

2.) Do the different drive firmware revisions have any impact on performance?

3.) Is SMB tuning needed either on the FreeNAS or main PC side?

4.) Could removing the GPU completely from the system and having the HBA and 10G cards in slots 1 and 3 (→PCI-E 2.0 x16, x16) be my first troubleshooting step?

Any help would be greatly appreciated. Thanks in advance!

hp185688 · October 15, 2019, 6:15pm

There are several things that could cause the issue here, not that you necessarily did anything incorrectly. I have a few questions…

Is this a new setup, when did the slowdown start or is this how it’s been since initial setup?
If the performance decreased after the system was functioning properly for some time and you’re sure this performance is not the result of using 5400 RPM drives (Nothing wrong with 5400 RPM drives, but my own 7200 RPM drives divided in a similar way are pretty slow to read, and get fragmented rather disappointingly quickly using software defined storage, without a network intermediary), What network fabric are you using, and do you have the ability to measure asic saturation to see if you might be I/O bound due to the network equipment?
Firmware usually does have a performance impact, especially as drives age. If you are up to date, there’s probably nothing to check there unless WD has something published, specifically, regarding the version you are using.
I always recommend tuning, however possible. You can usually make very big performance inroads when digging into this. If you haven’t already, there might be a mathematical formula that comes into play for you to determine the desired disk “striping” or perhaps slightly smaller block size for the number, size, block size capability, and expected transaction volumes for your use case. Early setup tuning can have a very big impact, if configurable, within the parameters available to you. If you can, I would back up what you’ve got on this system and analyze this closely. I’m very curious what tuning items are available to you, they could be major.
There certainly could be a failing drive causing the issue. It is very common for a drive to slow down before a failure. On the other hand, if this is a new setup and given that most of the time these combinations are unique, many of us don’t use the exact same two configurations twice and therefore can’t certify a specific set of performance characteristics, we are using 5400 RPM drives in this case and they will slow themselves down if they potentially aren’t cooling sufficiently, and other things which could be in play that aren’t currently known, these numbers still don’t sound so bad that they necessarily indicate an issue.

I would check the tuning levers available to you. Striping, in software defined storage for example, is a major factor and there could easily be similar low level causal contributors in play.

I’m not having a lot of luck, so far, finding the numeric performance capability of the HBA you mentioned. It appears to support many drives so the likely-hood of an issue there seems low, but I’m not sure at this point what the index for 6 drives in a Z2 RAID is going to look like for equipment with it’s abilities. You may already be able to rule this out, but I haven’t yet.

I suppose I don’t see the CPU you’re using or the drives on the list of compatible CPU’s for the HBA. However, I don’t know of any specific reason why that is a material issue since many of those on this list don’t have the native NVMe SAS instruction set integrated on the CPU. Besides that, nothing other than raw compute is going to matter on this interface, that I can see, and I also don’t see any notes suggesting that native NVMe support is crucial to make the most of this card.

You are using PCI2 with a PCI3 card, but I think it’s highly unlikely you’re going to saturate a PCI2 interface here, as you mentioned.

Jared · October 20, 2019, 5:03am

Thanks for your response, hp!

1.) I have had my FreeNAS box for the past 2 years. Only recently did I upgrade to 10gig ethernet and add the HBA card. With the onboard 1gig NIC and onboard SATA ports (despite a mixture of SATA 2 and SATA 3 ports) I always achieved full gigabit SMB transfer speeds (±115MB/s). My RAID-Z2 pool was never really challenged before the upgrade.

2.) The fragmentation of my pool is currently at 4%. Shouldn’t be an issue.
Concerning what network fabric I am using I mentioned all the components in the “specs” section at the beginning of the post.
SFP+ modules with Direct Attach Copper cables, Mellanox ConnectX-3 NICs on both ends etc.

3.) Nothing to be found on the official WD site pertaining to important firmware updates. If I am not mistaken the newer firmware revisions allow you to access additional S.M.A.R.T. information and that’s it.

4.) All I am looking for is “expected” large sequential read speeds from my FreeNAS system with 6x3TB WD Red drives. I predominantly copy massive .mkv movie files between the two systems… no databases with an immense amount of tiny files.

Due to case restrictions (Fractal Design Arc Midi, maximum 8 drives) and the inability to backup everything I have somewhere else, there is absolutely no way I am going to go for striped vdevs and lose capacity. A single RAID-Z2 pool is perfect for my needs.

I constantly monitor my HDD temperatures and they only reach 36°C when beating the crap out of it with stress tests.

Btw, I have an “E5645” Xeon processor. Not an “X5645”. That was my mistake.

ronclark · October 20, 2019, 12:40pm

Something i just learned on my freenas system is there is a lot of tuning you can do. after lots of digging and its not talked about much is setting your Record Size to match your workload.
Its found under Storage>Pools>Edit Dataset Click Advance options (the Default is 128k). since your data set is large files i would change it to 1M records, for VM work loads i set mine to 16k records. 16k Record Size gave me way better IOP’s, 1M Record Size increased the writes of my large files.

if you have not yet done so you should add some tuneables to help your 10g network out here is a good starting place.

hp185688 · October 23, 2019, 4:36am

Hmm, it sounds like tuning the HBA could be the issue, like @ronclark suggested. Alternatively, could also be a driver issue on the HBA, I suppose, from what you’ve said. I’m not sure though.

I guess there’s a chance the HBA may not be capable of operating as quickly as the system had with the prior configuration. That’s difficult for me to say. In my experience, the HBA usually speeds up the overall system when the system itself is already being used for other things, like VM management, for example. In that case, taking the workload off the CPU to manage storage helps the overall system speed up a lot. In this case, though, this system is dedicated to storage, so that’s a bit different than what I’ve worked with.

Let us know if you see any changes in performance by changing tuning parameters, even if they’re worse. I’d be curious to know, even though I don’t use one of these.

GigaBusterEXE · October 23, 2019, 4:57am

this is pretty weird to have read speeds slower than write especially on a mechanical drive with a 2 drive parity

Jared · October 23, 2019, 5:31am

I know right… super weird!!!
I changed the record size of my “diskspeed” test dataset to 1M but no noticeable differences.
Is anyone informed about how reliable “diskinfo -t /dev/” test speeds are to possibly single out a particular drive that is slowing down the pool? I do not know how else to achieve this other than destroying the pool (which I am not going to do) and performing dd tests on each individual drive.
Here is a comparison of a WD Red drive of mine that is 5 years old

and one that is 2-3 years old

Trooper_ish · October 23, 2019, 8:30am

Hi, did you try

zpool iostat -v 1

Or anything to tell if one particular drive is being slow?

I like the dd test to the pool; did you by any chance test the system?
From random (or urandom) to null/ zero to null?
I don’t imagine the board itself is the bottleneck, but no harm in trying?
Apart from that, maybe a cache might help? Though with large transfers of large files, no cache does seem a good strategy

Jared · October 23, 2019, 9:20am

Hi, Trooper.
Here are 3 screenshots during a read test of a 35GB movie file from my FreeNAS to my main rig:

Results for dev/zero and dev/random tests:

Pretty weak random results at 86MB/s, right?

nx2l · October 23, 2019, 1:04pm

are all the drives online?

What is the cpu wait percentage while doing that?

Those two drives with zeros … while the others have info being read… concerns me.

Jared · October 23, 2019, 2:59pm

In the first screenshot it’s drives “2” and “3”. In the following two it’s drives “1” and “6”. Maybe I should have added more screenshots
Every one of the drives intermittently dropped down close to 0 (seemingly always 2 drives simultaneously) at some point during the read test.
All of my drives are online. I have gone through the scrub results for the pool and the S.M.A.R.T. results for each drive. Nothing to worry about other than 3 of the drives have over 50k power-on hours. But I have never encountered any errors.
My Pool is at 85% used capacity right now. Should that be of any concern or was the above 80% thing a rule of thumb that has since been out-dated?

Trooper_ish · October 23, 2019, 3:15pm

I wouldn’t worry about that myself, 2 drives should be idle -raidz2 means a stripe will have data in 3 locations, parity on 2 others, unless I’m reading it wrong?

I’m glad to hear it rotates among the drives, and I presume if the block size was smaller than a whole stripe, you might find more parity reads than data reads.

Jared · October 24, 2019, 5:35am

Alright. Good to know.
I am still racking my brain as to what is causing this read weirdness when something is not cached in RAM. Yesterday I copied a 5GB movie file back and forth about 4 times between the server and client and I got 800MB/s-1GB/s read speeds. As is to be expected with super fast RAM caching that can circumvent the usual bottleneck, which is the spinning disks. Unfortunately I cannot do a true one-to-one test for the HBA vs onboard SATA because my X58 mobo only has 2 SATA3 6gbps ports, which I have disabled in the BIOS in addition to the audio and USB3.0 ports to free up as much bandwidth for the 10g NIC and HBA PCI-E cards. I wanted to rule out chipset pci-e bandwidth constraints.
I guess it has to do with one of the drives (or more) and/or the LSI HBA card.
Has anyone taken a look at the “diskinfo -t /dev/driveid” results I posted of an older vs never drive? Can one glean any useful information at all from those tests?
Thanks for your input, guys!

GigaBusterEXE · October 24, 2019, 5:48am

Oh what’s your qpi or whatever it was called on 1366 set to?
In bios

QPI is what’s used for connecting the CPU to the Uncore and PCI-e lanes, there’s another for CPUs to talk to each other in a dual cpu config but that’s not what important here

I can’t remember if you can force it to run at a rate higher than stock without changing you FSB, I know you can lower the multiplier

You can lower your core multiplier and memory divider if you don’t want to overclock/overvolt those
With ram set to 1333, you can put a fan for 160 and that will give 1600 ram, and bring your qpi from 5.86 to 7ish
Core multiplier can be either 15 or 17.5 depending on if you want base or turbo speed

Or a simpler solution to buy a 9$ X5650 that will have 6.4 qpi stock

If my math is correct you currently have 23.44GB/s bandwidth where a cpu with 6.4gt/s (3.2ghz) will have 25.6GB/s

You might still want to OC the QPI a little, I’m still a little fuzzy on How much bandwidth you’ll exactly need
I think you’re using 32 lanes with expansion cards so that’s 16GB/s
I think SATA, USB and some other stuff share that bandwidth as well

thro · October 24, 2019, 5:49am

make sure atime tracking is OFF for performance, if you haven’t already.

otherwise every read or write will generate another write…

Jared · October 24, 2019, 4:13pm

Atime is off for the pool and all datasets in it.
I tried going into the BIOS and setting the QPI speed to 6.4GT/s. Thereafter I could not boot and got a “keyboard error”. Had to reset the cmos every time I tried it. Thank goodness there is a handy little button on the motherboard AND on the rear I/O. Gotta love workstation/server mobos.
Will post multiple screenshots of the BIOS settings available to me concerning CPU/memory speeds and QPI related stuff (PCI-E maximum payload size & PCI-E frequency as well) later when I have time.
Found this handy layout of the x58 chipset:

Btw, my CPU never runs hotter than 35°C. Although I did have it overclocked to 3Ghz in the past, netdata never showed the frequency going higher than 2.4ghz… ever!

Jared · October 24, 2019, 4:50pm

Everything I have disabled

Default HPET Mode is 32-bit. I changed it to 64-bit.

Possible performance tweak settings?

GigaBusterEXE · October 25, 2019, 1:50am

Yeah I don’t think it’ll work just by upping the multiplier for qpi, might have to OC FSB and adjust all other multipliers, dividers, and voltages, there’s a voltage for qpi and the uncore you might have to bump up slightly

It looks like your ram isn’t at 1600, but 1066

Jared · October 25, 2019, 5:33am

Oki doki.
I don’t think RAM speed has an impact on uncached read speeds.

ronclark · October 27, 2019, 3:39pm

I have a back up Freenas server its a Dell R410 with CPU X5650 @ 2.67GHz to 20G ram, LSI SAS 9201-16e, way lower spec than your system. I don’t think you have a CPU bottleneck.
I have two Raidz1 pools, I know there not Raidz2 but for compresent here is what i get.
Compression turned off

Raidz1 5 wide 4T 7200 Drives 78% full

dd if=/dev/zero of=/mnt/rfj/test/ddfile bs=2048k count=10000
10000+0 records in
10000+0 records out
20971520000 bytes transferred in 113.375472 secs (184974048 bytes/sec)

dd of=/dev/null if=/mnt/rfj/test/ddfile bs=2048k count=10000
1860+0 records in
1860+0 records out
3900702720 bytes transferred in 2.988624 secs (1305183461 bytes/sec)

Raidz1 8 wide 1t 7200 drives

dd if=/dev/zero of=/mnt/rc/test/ddfile bs=2048k count=10000
10000+0 records in
10000+0 records out
20971520000 bytes transferred in 133.006761 secs (157672586 bytes/sec)

dd of=/dev/null if=/mnt/rc/test/ddfile bs=2048k count=10000
2689+0 records in
2689+0 records out
5639241728 bytes transferred in 13.042901 secs (432361007 bytes/sec)

I would see if demsg has anything.

do you know if your HBA is running the right firmware version for freenas?