Hello dear forum members,
I’ve got a homelab server that’s also my NAS, and I’m moving from a simple OpenMediaVault configuration to a TrueNAS configuration in a second box.
This is the current machines:
Box 1 - Proxmox 7 + OpenMediaVault + 2 x 1TB mirror SSD in a Xeon 2673v3 with 1 Gigabit ethernet port;
Box 2 - Proxmox 8 + TrueNAS Scale + “LSI” SAS2008 + 6x2TB SSD in a Ryzen 5900x, also with 1 Gigabit ethernet port - This machine also has a secondary 4TB spinning rust drive meant to store disposable media, such as temporary movies or recordings;
Both boxes have their own boot drives and run a small set of services such as PiHole, and have another 1TB SSD to store VMs for small projects, but that’s not the relevant part of the question, just adding here for context.
The focus of the problem is on the new box, with TrueNAS Scale. It’s configured right now with 2 different datasets:
SSD Dataset = 1 Pool of 6 SSDs in RaidZ2;
Spinning Rust Dataset = 1 Pool of a Single 4TB Seagate HD;
When transferring large sequential data to both datasets, I can see an ethernet pattern on the SSD dataset like this:
And the transfers sometimes reach 115MB/s, and sometimes dip to 22MB/s. I understand that the 115MB/s is the Gigabit Ethernet ceiling, but I don’t get the dips.
When doing the same transfer to the Spinning Rust Dataset, I get the 115MB/s sustained write without any dips. Talking about some large transfers, like 120GB of data at once.
The HBA used in the new system is exactly this one from Aliexpress:
And it’s used in PCIe pass-through to the TrueNAS virtualized box in Proxmox as a Raw device:
I’d like some help to try to pinpoint why the SSD Dataset is having these dips.
I can destroy the pool and recreate it any amount of times to do any necessary tests. All the SSDs went through 4 days of tests using badblocks
and all passed with flying colors.
I’ve got a few possibilities, but I don’t know how to test them appropriately to pinpoint the exact location:
1 - Terrible quality SSDs, that cannot sustain write for a long period;
2 - Overheating HBA card (I’ve placed a 40mm fan on top of that heatsink);
3 - Bad pool/vdev configuration (it’s my first one, so I might be doing something wrong);
4 - ??? (any suggestions are well appreciated)
The 4TB HD is also hooked to the same HBA card where the SSDs are, so I’m not sure if the HBA is the problem here.
I’m OK of not having blazing fast NAS, as long as it’s long-term reliable and I could eventually upgrade to better SSDs.
Acquiring hardware in my case is somehow difficult due to market conditions (things in Brazil are too expensive or hard to find), so these boxes were created with mostly Aliexpress components, that might not be the best quality possible. A single 12TB HD here cost more than all 6 SSDs combined, and would be a single point of failure.
So, how can I isolate the offending component, or even understand if I have more than one offending component there ?