I’m making this thread in response to the various requests I’ve seen in my own journey for wanting to know the configuration specifics for the Level1 172TB ZFS SAN. I’ve seen scattered threads with various replies with small bits and pieces of details but nothing concise or informative.
In the video @wendell mentions/shows a bunch of stuff. There’s a bunch of guessing for a lot of that stuff but it would be really nice if we had specifics so that we had the option to recreate it on a similar/larger/smaller scale, or even pivot from that reference point.
My questions are, but are not limited to:
What disk shelves were used? Wendell mentions rebranded LSI shelves but they look like DS4243/DS4246 shelves. The difference for these lie in the SAS modules that they ship with (3Gbps per channel as apposed to 6Gbps per channel)
What HBA/SAS controller was used? Again, LSI was mentioned but specific model, firmware, and configuration would be amazing.
OS configuration. Wendell mentions in another thread that the disk shelves are configured for active/active. I assume this is with something like multipath? Or is this a disk-shelf-specific thing? Would be keen to know how this was done either way.
@wendell I know you’re a busy man with a lot of M&Ms to get through but myself and I’m sure a bunch of others would be eternally grateful if you took the time to list this stuff out. I personally have a DS4243 hooked up to a H200e SAS HBA and I get nowhere near the performance and speeds you do locally.
It’s on fedora right now, and the controller is currently a 9216. The controller in the Gamers Nexus build thread (tri mode hba 9405w) is probably what I’ll move to next.
Each shelf has the sas6 controller. The 4243 shelves can be upgraded to sas6 while 4246 are sas6 out of the box.
Each shelf has 4 vdevs currently … more vdevs generally help with performance. To save power I’ve brought each shelf online as we needed space so we are only up to two shelves so far lol. And it’s still plenty fast enough to saturate 10 gig .
What sort of performance are you getting? Even the Gamers Nexus setup was barely clearing 1 gbyte/sec read with 3 vdevs and only18 drives.
Multipath is helpful to avoid bottlenecks for sure.
In terms of datasets, we have many zfs datasets. The SMB shares generally have case sensitivity turned off. The ones storing videos have compression turned off. The ones storing docker vms have compression and case sensitivity. I think I turned off synchronous writes since we have a bbu which is still somewhat dangerous. I tuned the thing that defaults to allowing 5 seconds of in-flight data to allow 30 seconds of in-flight data.
Maybe its how I’m testing. I don’t have a 10G network to test so I’m doing dd tests locally, as well as copying large amounts of data for prolonged benchmarking. I never really get above 150MB/s.
One of the things I haven’t had a chance to test yet is changing the PSU configuration. My shelf came with 4 PSUs but I only have one plugged in. Maybe it’s thinking it’s running on reduced capacity/redundancy and throttling stuff? IDK.
with 7200 rpm disks 1 is fine, but I’ve seen some pretty bad stuff happen with one psu trying to keep a full shelf of 15k drives going…
use all 4, why not? also the drives will overheat unless you have the blanks in for the empty bays
OK, I think adding the second PSU has solved the issue. One more thing though: is it normal for the top of the disk shelf to be actually hot to the touch? Like nearly to the point where I can’t hold my hand on it. I’m currently moving data to it so maybe it’s that, but thought it’s worth the question all the same.
@CandyBit if you got 4 PSU with the shelf, why not plug them in? The extra PSUs can only provide failsafe if present when the failure occurs, they do nothing if left out.
Do you mean they are physically installed but not plugged into electric? In which case they are merely providing some ventilation, rather than actual cooling
I’m fairly sure that’s not the case. I started copying a large dataset to the disk shelf with 2 PSUs plugged in and things got very hot. Once I added the other 2, all the fans on the PSUs kicked on and temps dropped.
Had drives will work hot, but die quicker. I think they recommend an operating temperature of 70C, so you might want to check out fan solutions if over?
Bursts above that are fine, but hours on end might lower the life.
There is a reason enterprises use the really loud fans…