Dual actuator HDDs - are they here to stay?

Fallon_Gray · March 27, 2025, 12:21am

NOnsense. I have plenty of old HDDs. Just keep them out of temperature and moisture oscillations, ESD, shocks vibrations, aggressive chemicals etc and they are fine.

And if any happens to fail on rare occasion, one can always have RAID in storage.
But even those fails are usually repairable. Matter of contact oxide cleaning etc.

twin_savage · March 27, 2025, 12:42am

Typically non energy assisted hdds are designed with normalized energy barriers on their media of greater than 40, which yields retention times in the decades to centuries. The energy assisted hdds have a much higher number than 40 at room temperature, but they have have more delamination failure modes.

From an old Materials Today publication:

A brand new flash based SSD will have retention times in the couple of dozens of months, a moderately worn SSD might have a retention time of ~3-6 months. Most SSDs don’t have a mechanism to refresh decayed cells preemptively and will let the data corrupt while at rest.

Fallon_Gray · March 27, 2025, 1:21am

But that’s the thing. All things being equal WRT data consistency and integrity protection, linux dm SW RAID (much less various RAID contraptions inside ZFS, BTRFS etc) can NEVER come close to catching HW RAID with battery backed RAM. Even with infinitely fast CPU etc.

Why ?
If nothing else, because HW RAID can retain in RAM all of the writes “in-flight” and replay them after the power-up flawlessly.

NO purely SW contraption can do that. Best that linux-dm can do is to have small bitmap region, usually in unused space in the drive itself.

It basically maps the drive in say couple thousand regions, maps a counter for every region and whenever something gets written to the file in region X, corresponding counter X gets incremented .

Kernel then fires a tasklet to recompute the parity of the stripe within region X which the write has hit and when that parity stripe gets written, that counter X is decremented.

if some writes get cut by power-off, some counters will be non-zero.
SW RAID will then recompute parity for the whole affected regions, reset their counters and possibly leave a warning in the kernel log.

But that is still FAR worse than HW RAID, because:

it is still shitty patch. Data is still lost, possibly catastrophically (partial writes etc), Driver just patches the parity syndrome of bunch of stripes. Only gain is that the whole RAID doesn’t have to be rebuilt, but maybe 1/1000 of that, so rebuild time goes from hours or a day to few tens of seconds.
in order for even this to work, drives need to write to a bitmap after every write, which means that your seek time goes to shit as half seeks go to bitmap and back.

This goes for any kind of SW RAID solution. So the only way for SW RAID to come close to HW solution is to forego any of that (= no bitmap file), which is total crap as in any kind of crash you are almost guaranteed to see massive downtime for RAID rebuild. Well, theoretically RAID “works” during rebuild, but barely.

Essentially no SW solution can fix this.

Fallon_Gray · March 27, 2025, 1:25am

Alelujah. But good luck telling that to muggles.

tk11 · March 27, 2025, 3:15am

There’s no such thing as “all of the writes” when power is cut…

When power is lost during writes, ways to prevent corruption at the file system level do exist, battery or not. You seem to be stuck on the false presupposition that holding writes in battery backed memory is the one and only way to ensure file system integrity… If you take a step back for a moment I’m sure you’ll agree that it isn’t.

If you can’t trust your HW stack and file system driver how can a hardware RAID controller possibly help? You’re screwed either way.

Applications interrupted mid write are always susceptible to corrupting files despite their last write operation completing successfully before being unceremoniously terminated. Hardware RAID can’t fix this either.

Fallon_Gray · March 27, 2025, 3:42am

HW RAID with battery-backed RAM can at least guarantee to write a complete set of RAID stripes with parity, whether that concludes the file write or not.

It can also signal to OS what was in-flight so the SW can figure out what files were involved and possibly flag them.

SW RAID can’t do that, at least not if it wants to retain maximal performance.

tk11 · March 27, 2025, 3:49am

Right, same as ZFS…

Fallon_Gray · March 27, 2025, 4:11am

RLY ? How ? Since there is no battery-backed in-flight buffer, if stripe setis partially written atthe power-off point, ZFS can do… what exactly ?

Check its internal journal and nuke the partial write ?
Wow. That’s an improvement. One has to know why he is throwing heaps of RAM for ARC cache and shitload of CPU cycles etc into ZFS internal management.

BUt the kicker part of ZFS is that it can make things WORSE if used on a machine without the integrity guarantee for the whole datapath. SO if one is running an off-the-shelf desktop machine without ECC in CPU, ECC RAM and internal ECC checks on I/O paths to the drive and the drive.
If something goes wrong when ZFS rechecks itself ( let’s say some ram region somewhere is corrupt and that is gets allocated as a buffer during recheck, WRONG hash will be computed and if that part is in internal RAID structure RAID-Z1/2/3, ZFS might very well “correct” it and thus start silently ruining the data.

There is a LOT of fine print with these things that muggles never take time to read…

wertigon · March 27, 2025, 8:08am

From the persons that actually, you know, made the original study about SSD retention:

If you are talking ideal conditions, SSDs can last for up to 3 decades in a worst case scenario:

I would love to see more data here, but at the very least it appears that initial assumptions are vastly blown out of proportion. But we are way out on a tangent now.

Just because a lie is repeated 10 000 times does not make it true.

Fallon_Gray · March 27, 2025, 8:33am

From the people that, you know, got “famous” just month or so ago for having their USED drives getting sold as NEW.
Yeah, that’s a chain of trust right there. Actually its strongest link.

Fallon_Gray · March 27, 2025, 8:36am

Worst case scenario of ideal conditions. WOw.
That’s pure military grade stuff.
Actually, make that Space Force.

wertigon · March 27, 2025, 8:45am

Interesting. Instead of reading the actual debunking article you resort to name calling and ridicule.

So the table above, the “1 year retention” is about SSDs that have reached end of life, with most sectors written out of spec… And even then, you can get up to 3 decades with optimal temperature management.

But it’s just so much easier to parrot the cargo cult wisdom

lemma · March 27, 2025, 6:21pm

No, no, and not really. Dual actuator drives are as named, a single drive with one actuator accessing the upper half of the platters and a second for the lower half. So outer edge read rates of ~275 MB/s can doubled within SATA III’s ~550 MB/s limit. That said, the Ultrastar HS760 seems to be dead and I’ve never seen a review or test.

Exos 2X14 and 2X18 are well covered, though, and I get the same as everybody else with 2X18: ~270 MB/s max from either actuator, ~465 MB/s from both concurrently. Seagate claims 545 MB/s SATA and 554 MB/s SAS but it appears quite difficult to achieve. Might be a ~~bogus~~ purely theoretical spec.

Windows does have a limitation of not being able to RAID two partitions on what it sees as the same drive, meaning it JBODs a SATA Exos Mach.2’s two halves and workload data striping’s required for effective use of both actuators. SAS versions’ two LUNs provide a workaround for Microsoft’s support fail. Not an issue with mdadm, btrfs, zfs, and so on.

And, er, this post is about dual actuators in a dual actuator thread. Oops, my bad. ¯\_(ツ)_/¯

twin_savage · March 27, 2025, 6:41pm

Is your 2x18 a SATA drive? I was under the impression that the SATA drives performed significantly worse than the SAS drives because the command queue stalled when bouncing across large LBA differences so often.

greatnull · March 27, 2025, 7:57pm

All point wendel hammers through are true and entire storage industry acts in accordance. Modular hardware raid for high speed storage is dead end for all the reasons stated above.

Same crowd that now preaches ZFS has made me try “glorious” BTRFS.
It was supposed to be a miracle. MIracle that everyone needs because … a bitrot ?

You didn’t do your research first? FFS there are no miracle technologies and btrfs is not stable solution, something widely known since 2010 even.
It might have become the next ZFS, but sun aquisition and oracle shaenagians led to development de-funding and project pretty much died.

HE still haven’t explained how is ZFS supposed to do data integrity check (T10-PI)

He did, in detail. Review videos and pay attention to details.

This isnt about trust, this is about capability. New high performance hardware raid is not being developed, as there is zero industry use for it. Older ones are absolutely incapable handling even older single enterprise nvmes, much less whole arrays of them.

Attempts like graid solution while faster are strictly worse that old enterprise solutions and do not offer any additional capability vs pure software approach. They are actually riskier.

So we dont need it, it would strictly lower performance when implemented vs what is done today - software handles parity and consistency and access direct nvme access or over switched fabrics for minimal performance loss.

That doesn’t mean hardware raid for high performance is impossible, its just is going to be hellishly complex and expensive due to speeds and latencies involved. The silicon would have to monstrous and might not even be feasible for acceptable latency cost.

Modern nvme performance is that high, that any intermediary processing element can and does incurs major performance penalties.

And since there isnt any pressing need for it, we are back to square one. Hardware raid is dead outside legacy or low cost applications. And low cost application will never have budget for good hardware raid implementation.

If there were technical and business case, broadcom would have already had a product on the market. Instead everyone is going DPU and direct access.

QED.

greatnull · March 27, 2025, 8:05pm

Also on that note, do read modern HDDS technical specs, its not a very funny reading though. Drives have DWPD limits too and it ainr high.

20TB cmr toshiba MG10 series for example is only rated for 550 TB writes per year, i.e 1,5 TB/day ~~ DPWD 0.075.
And that high end enterprise grade drive.

What effects will HAMR and MAMR have on effective drive lifespan?

MazeFrame · March 27, 2025, 8:17pm