Painfully slow FreeNAS Performance via 10GbE

I have 2x 10GbE FreeNAS machines which perform poorly if the speeds others get are accurate
(and I assume they are).

To test transfer speeds on the unit which doesn’t yet have data,
I striped 7x 6TB 7200 rpm 100MB/s (System 1, below: MacPro 3,1)

I have however, transferred data between my Windows machine and my MacBook Pro (ATTO 10GbE to Thunderbolt – ThunderLink) and was able to obtain over 400MB/s transfering to and from NVMe drives.

I just can NOT figure out how to make FreeNAS generate much throughput.

I’m considering just booting the MacPro in to OS X and setting it up in a striped array to test the performance via OS X … to troubleshoot some of the hardware or software that way…

Any help would be appreciated. If you’d like me to provide info, can you please make it easy for me to understand which information you’d like…? Ideally, request a screenshot and define exactly where I’d go to provide said screenshot.

CONFIGURATION:

NETWORKING:
• SWITCH 1: D Link DXS-1210-12SC 10GbE Switch SFP+
• SWITCH 2: Airport Extreme + AP Express extender

WiFi via Airport + Extender
SMB sharing
10GbE via SFP+ Fiber

  1. MacPro3,1 Dual Xeon CPU
    • 2x 2.8 GHz - Quad Xeon (X5482)
    • 16GB ECC RAM
    • 7x HGST Ultrastar 6TB SAS 7200 rpm
    • LSI 9211-8i SAS controller
    • 10GbE SFP+ Network card (shows connected in FreeNAS)
    • TrueNAS core 12.0 Beta
    LESS THAN 100 MB/s … even with 7x striped drives with no data.
  2. Dell PowerEdge T320 (single CPU)
    • FreeNAS Version: 11.3-U2.1
    • E5-2403 0 @ 1.80GHz
    • 48GB DDR3 ECC RAM
    • 10 GbE SFP+
    • SMB Sharing
    • LSI 9211-8i SAS controller
    • 8x 10TB HGST IBM Ultrastar SAS 7200 rpm
    • RAIDZ 2
    • Fastest transfer speeds up to 180 MB/s

Does this sound correct…?

IN ORDER TO PROVIDE ANY REQUESTED INFO, PLEASE STATE SCREENSHOTS TO ATTACH

THANK YOU!!!

Out of curiosity, have you tested moving data to the array from within the same machine to test the array’s performance? Also, what type of data are you working with? Massive files like video, or many small files? NAND is great for small files thanks to the massive read/write iops where spinning disks are not.

Things I would look into

  • Try massive files
  • Can the array read/write faster internally
  • If so, test the network with iperf to ensure you have a proper 10g link
  • If that all fails, maybe the limiting factor is SMB or the single core speed of one of the CPU’s, which is a hidden factor that is easy to forget.

EDIT: I just came from another thread that I started with similar problems. The “solution” was I was linked this by a the wonderful @anon75264233 https://jrs-s.net/2018/08/17/zfs-tuning-cheat-sheet/ and it may be of some use to you.

2 Likes

Unfortunately I’m aware of that … and was transferring 1GB - 2GB video files … (If it were emails or messages 100MB/s would actually be fast even for NVMe drives)… very very good point; I should have thought to point that out already.

Believe me, I truly appreciate your suggestions … is that an SMB thing…? Of CPU Speed…? If so, I could run Windows 10 on this system and use SMB and share a couple of drives and check the transfer rate to see if it’s an OS thing…??

IF SMB doesn’t play a large role regulating the speed of sharing then the below would be my logical way of evaluating what I know of the hardware … but it IS conditioned on the role SMB plays – and without regard for the SMB variable:

The … single-core speed … interesting, okay. But still … I could see a CD from 2005 being limited to 250MB/s … but a Xeon chip that’s about 100 watts…? I’ve definitely tested that model MacPro with an NVMe drive and gotten over 1GB/s … but still, I do appreciate the point. That said even a Core 2 Duo 2007 MacBook can do 250MB/s with a single SATA Solid-State drive… and that’s a 2GHz CPU that’s 30 watts TDP … whereas this CPU is 2.8GHz and the Xeon version of the same NM I’m guessing… but unless there’s something specific to FreeNAS which makes the single-thread CPU limit lower, I’m (politely) guessing that it’s not remotely the case for this model from my experience.

As far as the E5 Xeon CPU which can’t even break 170 MB/s … you know… that thing should be WAY faster. I get that it’d be preferable to have a faster clock speed on the CPU … but it just can’t be the issue. I’ve used this computer in Windows 10 and got 200MB/s per drive as I cloned 4 drives to 4 targets (meaning 1.4 GB/s basically sustained for a day, as I cloned 4 4TB drives to 4 targets via DD Rescue in 1 day via this system. And I really think it’s a massively low threshold for the power of a CPU to manage a hard drives meanial transfer rate… no…?

I just do not get it though! I am SO confused by this…

I’m going to read that link you provided … I’m very grateful for all your thoughtfulness and help. I don’t mean to shoot your ideas down – in fact, please know I’m not even meaning to do that. I’m really asking you to argue my logic and tell me I’m wrong … because you’re RIGHT that … adding SMB in to the mix is a factor I have absolutely ZERO “common sense” to evaluate and will rely on the wisdom of people such as yourself.

Thank you again,

Truman

LZ4 compression is faster than storage. Yes, really. Even if you have a $50 tinkertoy CPU and a blazing-fast SSD. Yes, really. I’ve tested it. It’s a win.

For one, that to me would suggest … if a drive can compress more than 100MB/s faster than it could otherwise write the data … Im likewise assuming that the CPU can easily keep up with the SMB speeds… but still, maybe I should disable LZ4 compression …?

Comparing apples to oranges, no pun intended. When you get down to networking, most protocols use a single thread to do that work on per file or per connection, so if a single thread (therefor core) can’t keep up, that’s where you see speed limits there. SMB is pretty heavy in that regard even though you would see native speed much higher, you are putting strain on the additional network overhead to make the data “network compatible”, be that SMB, NFS, ect. That is an older xeon which is still plenty capable, but SMB on a single thread is also quite the task. I have a friend that could only push 3 gigabit on his laptop with a 10g nic over ssh for example as SSH was the limiter, and also only uses 1 core.

For testing the “best case” you may want to move massive files that are several GB’s (more than that of your ARC size) just to ensure you are getting the most accurate results that you can. I haven’t used windows in ages, but under BSD you can check top (or htop) or whatever cpu usage program, and see if the thread is using 100% usage. This is not to be confused with more than 100%. If a thread uses 400% that means 4 cores are saturated as this measurement is a “per core” measurement. On Windows, it just hides this from you from what I know and only shows overall usage which is not close to correct. maybe someone else can give you a better answer on how to see per thread usage on Windows.

Side note, maybe also test NFS. It’s generally faster than SMB in my experience, and Windows also supports it these days. It’s not native fast, but I can saturate gigabit with no tuning on large files, and don’t have 10g yet.

I would recommend enabling that. Remember that you have more than 1 thread on the CPU, so you have spare cycles, just not ones that SMB can use. Threading is a hard problem to solve, but that means multi cores can be used for many things at the same time at least.

1 Like

Enabling or disabling compression would only affect new data written since the change, but worth a shot- video files likely won’t improve, but its’s doubtful you’d even notice any slowdown with it enabled, as it only tried a few snipets of a file before giving up and just copying the file.
The single threaded-ness of SMB that @kdb424 mentioned is less about the actual CPU’s capability, and more about the protocol failing to saturate a link, even with beefy single threaded performance.
As he said, try a large local copy, like to a different drive, to ensure the local read/write speed is good, then perhaps try iperf or similar to check the connections between machines- just in case one port is somehow hobbled down to 10/100 speeds because of a bad strand in a Ethernet cable

Dumb question based on this statement but do you have more than NIC in the machine? If so is the other NIC 1GbE? Have you set the SMB connection to use the 10GbE card?

You don’t mention what pice port the 10GbE controller is plugged into. Is the port sharing bandwidth with something else in the system and only running at x2 speed?

Just so to clarify. In my testing SMB was always superior against NFS or iSCSI, especially on windows SMB is flying.

I am running 6x3TB 7200rpm drivers and they go to 800 MB/s in Crystal disk mark and around 500 MB actual transfer speeds on iSCSI windows.
So no your config should be much faster. Especially the first few GBs where it’s cached into RAM.

Try direct SPF+ PC to Server. So you take out the switch and disable the ethernet in FreeNAS. Then try again.

EDIT: There is also SFP+ tuning in windows like enabling jumbo frames and what not but still you should get higher speeds than this.

1 Like

I like those questions (and many of the others, also). I’m pretty hip on the architecture of this board … and I’m sure it’s in an x8 slot.

Also, re: CPU … I’ve never seen the CPU at over 15% utilization under FreeNAS (to the extent that the System Information overview of FreeNAS shows much of any use)… though, I will definitely check in the methods described above.

The PCIe Slots on the PowerEdge T320 are PCIe 3.0 x8 (there is a slower slot but it’s not in use)… and the CPU isn’t a bottleneck for PCIe lanes, either, as there’s only the HBA and 10GbE cards in the machine aside from the built-in devices.

(I can upload the manual if anyone would like) …?

I’m not really clear on how to setup a direct Peer-to-Peer link via SFP+ … is there any chance someone here would be open to charging me something affordable to assist with some of these steps remotely…? I’m not a sys-admin … I do hardware troubleshooting and Mac repair. I get ‘out of my depth’ very very quickly in this domain… :frowning:

How do I ensure that SMB is linked to the 10GbE …?

Can you send me some screenshots by chance…?

I did exceed 125MB/s doing some transfers … but still, I’d really like to do those steps. Thank you!

There is your bottleneck. It will never get to use the “10 GbE SFP+” bandwidth.

1 Like

That’s 15% of the total CPU usage as a heads up. It’s possible that one core is fully loaded. Example would be a 2 core CPU showing 50% usage would mean one core is at max, but another process could use the other core. Not saying that IS your limit but want you to be aware of how the 15% can be misleading.

You can test this once again with iperf3, and we can’t really offer more help without this as we don’t know what we are debugging. Here is some random help I found with google for your platform, though I don’t know freenas specifically https://www.ixsystems.com/documentation/freenas/11.1/cli.html

I assume that you are asking about jumbo frames https://www.techjunkie.com/jumbo-frames/ That should answer those questions.

Do you have a 1GB ethernet connection on these machines, or only SFP+ connections? If both, why? The reason I ask is because your systems may be confused about the path to take and may choose the slower one, and this is also why we are asking for the iperf test, to ensure that the 10G route is taken, and if so, then we know it’s a protocol/storage problem. For all we know now, you aren’t using the 10g link.

They stated that they tried a striped volume as well to no avail, so an iperf test may show us that they aren’t even using 10 gig in the route. I totally missed that while distracted with networking though. Good catch.

1 Like

I will reply to every question asked or try to test every suggestion … this reply is just to respond to those things I can answer now; the omissions are only temporary and I will circle back to address them later unless they become irrelevant for other reasons.

**Re: The path (1GbE or 10GbE) to take…
there’re integrated 1GbE and 100Mb (? WTF!??) ports on the MB. **

To disambiguate: I disconnect 1GbE cables & leave only the 10GbE when testing…

Re: 15% being one core… it’s only a quad CPU. :slight_smile: … I considered that. And that is only for a second or two that it hits that number, usually it’s at about 2% when performing transfers on the Dell. I will double check the MacPro but expect it’ll be about 5%.

As a simple method of testing everything except the LSI 9211-8i and the Mellanox 10GbE NIC … I’ll remove this array and throw in a single NVMe SSD and test transfers that way.

Testing whether the CPU can truly support 200MB/s over SMB seems a rather tenuously low bar of tasks for me to learn how to test and rule out, but I will. I’d love to hear of an actual example CPU which is inadequate to this task… This IS a 2013 Xeon sold in a Dell PowerEdge which OEM … holds 8x LFF drives … was sold WITH an SFP+ NIC and optionally, Red Hat Linux for the literal sake of being a server.

No one would be shocked if it couldn’t saturate 10GbE due to the CPU …? Or the PCIe …?

I ask those questions with deference and gratitude for the help … but how much time within this troubleshooting process should I devote to learning how to test things which have a .0000001 % chance of being the actual cause of the problem…? You know?

Glad you got it. I know it’s a concept that many can easily misunderstand :slight_smile:

That’s a great first step

Glad to hear that the correct path is being taken. I still advise using iperf to get us an accurate measurement of what the 10G cards are doing in raw performance. It uses ram on both sides to remove all bottlenecks and give you the raw performance of the NIC. If there’s an issue with a NIC or the switch, or overheating somewhere, or a cable issue, this is where you are going to find the problem. Only with iperf.

After it’s installed on both machines, pick one and run

iperf3 -s

and on the other you’ll run

iperf3 -c

And it will give you a result like this

-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.0.51, port 45340
[  5] local 192.168.0.21 port 5201 connected to 192.168.0.51 port 45342
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   112 MBytes   940 Mbits/sec
[  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
[  5]   2.00-3.00   sec   112 MBytes   941 Mbits/sec
[  5]   3.00-4.00   sec   112 MBytes   941 Mbits/sec
[  5]   4.00-5.00   sec   112 MBytes   941 Mbits/sec
[  5]   5.00-6.00   sec   112 MBytes   941 Mbits/sec
[  5]   6.00-7.00   sec   112 MBytes   941 Mbits/sec
[  5]   7.00-8.00   sec   112 MBytes   942 Mbits/sec
[  5]   8.00-9.00   sec   112 MBytes   941 Mbits/sec
[  5]   9.00-10.00  sec   112 MBytes   942 Mbits/sec
[  5]  10.00-10.00  sec   250 KBytes   895 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

I can’t reach that speed with NFS on my array, but iperf is raw ram to ram, so my network backbone/cables/nics are tested, not everything else. I was able to troubleshoot an incorrect negotiation to 100mbps on one nic before like this. You may have a negotiation issue that forces you into gigabit. I can keep asking questions, but these 2 commands can help all of us know more about your problem, and give you real suggestions. Until then, we are only guessing.

Quick observation in doing research … I’ve found other threads with people who have FreeNAS systems that can saturate 10GbE (sometimes they’re troubleshooting the performance dropping off after 5-10 seconds for another reason) … which also use the same E5-2403v3 CPU as is in this Dell PowerEdge … which I’d like to proffer as having ruled out the potential that the CPU could be the bottleneck. :slight_smile:

I did figure out how to change the MTU on my Mac in which I’m using the 10GbE over Thunderbolt interface to 9000 … though, I’ve read dissertations in which someone described becoming an absolute expert in the pursuit of that variable’s role in performance … and concluded after 3 years of fiddling as a sysadmin, that it was just not. worth. the effort, and even leaving it at 1500 was a negligible difference.

FWIW: I went with SMB bc in another troubleshooting post, a FreeNAS developer from IXsystems replied saying that NFS performance is poor and I should do what’ required to get SMB working.

I had to reinstall the OS on Workstation, struggled to get SMB working on FreeNAS … ultimately requiring I WAITED FOR A NEWER VERSION OF FREENAS!! lol. Bc they’d released versions which literally were MISSING SMB or it was “broken” !!! You start the service and only get errors about why it failed to start… I’d dig through the file system and find that there was either some ‘pathology’ in some of the resources or was missing content. It was a huge PITA.

Oh! I did have a pucker-factor where I worried for a second that the HDs I have were SMR crap … fortunately (and as I strongly expected) it wasn’t the case; these are HGST Ultrastars and they use PMR …

Speaking of which, these drives have a read / write performacne of:

He10 HGST SAS HDs: 229MB/s Read / Write

I know better than to think that you just ‘multiply the speed by 8’ … which is NUTS that people think that. None the less, these ARE high performance drives… and as I mentioned … even with the

7x 6TB HGST Ultrastar 7200 rpm drives … STRIPED!!!

The unit got a whopping 100MB/s!! over 10GbE

This is a unit which I’ve seen do 900MB/s with an Areca RAID controller… it just does NOT make sense to me.

Also, to rule out the switch as being a POS …

I did do a test between peers (Windows and OS X), both with NVMe SSDs & sustained 300-450MB/s (limited by SSD performance).

I’ll load Windows or Ubuntu on the Dell T320 & throw an NVMe in it & test it out with SMB … and will do virtually the same with the MacPro only with OS X 10.13 + an NVMe SSD & controller.

I’d be elated to spend $100 if it provided exact answers for each system.
Seems like an expert could troubleshoot via remote access in under an hour.

I’d LOVE to find such a tech… :slight_smile:

Did you make sure that your ashift values were set correctly? Have you tested transfers from the NVME drive to the array while they are in the same machine? It’s getting quite hard to want to continue offering help as you aren’t willing to test what we need to help you.

Assuming you’re talking about iPerf3 … you may be mistaking ‘unable’ for ‘unwilling’ …

I’ve installed it on my laptop, have been reading about it and found a tutorial to set it up on FreeNAS in between working.

I can’t test things as quickly as they can be written.

I’ve been thinking about how I’d do a local transfer between drives within the FreeNAS machine … and frankly I just don’t know how to do it without it potentially hopping through my workstation … unless you mean just using the linux commands from the terminal…?

You can use ssh and a terminal to get to that machine and use the cp command (copy) to try from a fast drive like NVME to the array yes.

I seen a lot of ideas thrown around, lets troubleshoot the freenas hardware.

I am running Freenas 11.3 on a Dell R710 with ConnectX 3 40g card, i can easily push 250 to 300 MB’s on old 1TB harddrives.
That firmware are you running on the LSI 9211? running a old firmware version can really screw with Freenas.
I would also update the ConnectX card firmware if you have not.
ever seems to be focused on the network, you should run some system throughput tests then move on to network trouble shooting.

Here is a simple dd test on your ZFS pool to see its basic speed

1) Create a dataset which has compression turned off. This is important because compression will give you a false reading.
2) Open up a shell window.
3) Type "dd if=/dev/zero of=/mnt/pool/dataset/test.dat bs=2048k count=10000"
4) Note the results.
5) Type "dd of=/dev/null if=/mnt/pool/dataset/test.dat bs=2048k count=10000"
6) Note the results.
7) Lastly cleanup your mess and "rm /mnt/pool/dataset/test.dat" to delete the file you just created.

Note: /mnt/pool/dataset will depend on your specific pool name and dataset name.