Mixed bag of performance on new-to-me FreeNAS server

FakeGamerGuy · August 12, 2020, 9:05pm

Backstory
Recently I picked up an old Intel SR2600URLXR server from my local computer recycler. It has 5 hot-swap bays in the front connected to an LSI LSISAS1078 controller. I configured it with 5 virtual drives, each containing one physical drive, so FreeNAS could run the drives in Z2. My initial experience was pretty rough because during the install I kept seeing these mfi0: xxxx .... Unexpected sense: Encl PD 00 Path .... errors. Eventually I managed to get it installed, but I still see these errors scroll on my monitor 5 at a time every few minutes.

The initial server config was 2x2GB of RAM, a single 4-core Xeon without hyperthreading, and three gigabit NICs (2 onboard and 1 add-in). I was getting good performance at first, with read/write speeds across the network at 112MB/s which is exactly what I expected across the network. I installed an SFP+ card with an Intel chipset and copper 10G transceiver, and speeds improved with an initial peak of 600MB/s which quickly fell to 230MB/s which seemed reasonable for mechanical drives and virtually no RAM. I picked up 96GB of DDR3 ECC and got that installed along with a pair of X5677 Xeons with 4 cores/8 threads each at ~3.5 GHz. That bumped performance up to 800MB/s-1GB/s sustained for a bit, then would fall down to 380MB/s, which seems totally reasonable for 5 drives in Z2. That said, it seemed like it should have been able to sustain the higher transfer speed since I’m only transferring a single 32GB file at the largest during my testing, so my RAM should be able to swallow the whole thing. In my research I saw that SSD caching drives aren’t recommended, but I was kind of desperate to get this think working at its full potential, so I installed a 500GB NVMe drive with a PCIe adapter, added it as a cache drive to my existing pool, and saw exactly zero difference (shocking). I did some testing using iperf3 between the server and another machine on the same 10G network, attached to the same switch, and was able to see an average of 9.35Gbps using -t 60 -b 0, so it seems like network performance is not the issue. After that I removed the SSD from the pool and set it up as its own pool, setup a new SMB share on it, and did the same file transfer tests. Writing to the drive I saw sustained 1GB/s for the entire transfer (first time I’ve seen that), but reading back from the drive I saw the same dip in performance, though it look about twice as long before the drop.

The TL;DR
New FreeNAS-11.3-U4.1 install
Intel SR2600URLXR Server w/ LSI controller
96GB DDR3 ECC
5x2 TB Hitachi mechanical drives (2 SAS, 3 SATA) in RAID Z2
1x 500GB NVMe PCIe

Good peak read/write, sub-optimal speed after several seconds. SSD has perfect write performance, but similar fall-off in read performance. Client system I’m testing with is connected to the same 10Gbps switch with NVMe storage. I understand I shouldn’t expect to saturate 10gig with a handful of mechanical drives, but seeing similar performance dips on the SSD is making me think there’s a config somewhere that’s borked.

Edit: Now that I’ve thought about it a bit longer, it seems like the performance crash while reading from the NAS is probably due to the cache on the SSD getting overwhelmed transferring a 32 gig file? That might explain why it’s fine sending the same file to the NAS.

Instead of a single mammoth file, I copied a folder with multiple large files that totaled 24 gigs in total from the NAS onto the client and saw a steady 350MB/s (cool). Deleted the folder on the client and transferred it again and saw a steady 1GB/s right until about 80%, where it dipped down to around 130, then eventually started climbing until the transfer finished. Repeated the test again and saw 1GB/s the entire time. Am I just being impatient?

freqlabs · August 13, 2020, 12:10am

Cache on the SSD in the Windows machine?

FakeGamerGuy · August 13, 2020, 7:13pm

Cache on the Windows machine’s SSD. I’m under the assumption that there’s a small cache on the drive for high-speed writes (and maybe reads?) that can then be written to flash at its slower native speed. I know not all SSDs have one, but I’m pretty sure this Intel 660p does.

FakeGamerGuy · August 15, 2020, 11:19pm

Through constant tinkering and rebuilding my pools in different ways, I’ve got what feels like stable performance. I can read any files I drop on the NAS at 1.1GB/s (I’m only transferring 64 gigs of test data so it can all live in RAM), and my writes are much more stable with a sustained 250MB/s.

I’m assuming the initial peak throughput is because I’m writing to the RAM write cache, then once that’s full, I’m bottlenecked between the mechanical drives and Z2 overhead. From what I’ve read so far, the write cache only stores 5 seconds worth of writes, so it seems like this is the best performance I can expect without going to SSDs or adding more drives via disk shelf or similar.

Edit: I setup my pool in a striped configuration just to see what the performance would be like without the Z1 or Z2 overhead. Not too bad for 5 drives. I think the main thing I did that changed the performance was how the drives were configured in the LSI controller, changing the Write Policy setting from Write Through to… something else that I’m forgetting because I didn’t take a picture of the screen, but the config warned that Write Through would use the controller’s own cache and battery backup which was safer but also decreased performance.