This is true,100%. However, for that exact areca controller, I’ve found cases where a single 4kb sector write was apparently missed and the drive consistency check later made it worse. This could have been connection issue or cable issue, or my hunch is, firmware bug.
Based on areca testing it does not seem to do read-verify but a later patrol read may catch the error. You’d need raid6 to recover from the situation, but it is unclear to me if the driver is written in such a way as to do that. The controller seems to handle the case well that a drive reports an error, or a drive times out, or a drive is missing entirely. The edge cases I’ve dealt with… you have to be really unlucky when the “wrong” 4k sector corruption leads to a cascade of failures.
Just the other day infact,
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 132K in 05:52:47 with 0 errors on Sat Apr 20 20:01:45 2024
config:
NAME STATE READ WRITE CKSUM
ssdpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
nvme-SAMSUNG_MZQLB7T6HMLA-1-part3 ONLINE 2 0 0
nvme-SAMSUNG_MZQLB7T6HMLA-2-part3 ONLINE 0 0 1
nvme-SAMSUNG_MZQLB7T6HMLA-3-part3 ONLINE 0 0 0
errors: No known data errors
This is 100% caused by the nvme backplane and/or cables being the slightest bit sketchy, which has slowly been increasing the last few months. We have already rotated drives and picked up a hot spare. This type of “well, it’s probably the backplane” is rarely, if ever, well-tested by hardware raid controllers. Patrol read is supposed to catch this type of errors the way that zfs does HOWEVER the reality is that it doesn’t really do much to correct the error… it corrects the inconsistency. perhaps clobbering the data in the process. LinuxMD assumes that the data is correct but not the parity, too, fwiw. Whereas ZFS knows exactly which thing is incorrect, and fixes it.
If you happen to be using 520 byte sectors instead of 512 byte… your raid card also knows where the inconsistency is, very similar to zfs, and can fix it that way. But in normal 4k or 512 byte? you rely on the drive to know 100% of the time.
and if you hit a software bug or inconsistency or, as in the above, just bad cables? sorry about your data
Testing
I would suggest zeroing the whole drive/array and then creating a 1 megabyte file filled with a test pattern and then scanning each drive stand-alone to see how the controller broke the test pattern across the drive. You could corrupt some part of it – no more than 4kb. Even just zeroing ONE 4kb block on ONE drive. And then see what happens when reading back that 1 megabyte file.