Trusting 8-10TB HDDs in an Array

Hey wendell,
Would you trust 8 or 10tb hdds in an array?
I think I wouldn't even trust em with 2 drive redundancy

Is there a system, that has file level redundancy?
So if there is a failure during an array rebuild
Then you just loose a single file ?
And not everythin.

I moved your post to get more attention.

@Wendell as @KuramaKitsune was asking directly.

-
Do you have reason not to trust them? I don't think 10Tb consumer HDDs have been in operation enough to have many results, but I don't see why there would be an issue except for larger data loss in the event an entire disk is lost. ZFS and Btrfs do software/filesystem raid which I believe generally has better benefits over hardware raid. But my memory doesn't serve me well in the exact positives and negatives.

just dont use seagate drives.. also dont use "archive drives"

1 Like

Snapraid.

If the disk fails completely then there's not much you can do. But with snapraid You'd only lose the files on the disks which have failed, and you'd be able to recover anything up until the point where it fails. And if there are unrecoverable blocks then you'd only lose the files which relied on those blocks to recover.

I'm not sure how zfs and btrfs handle it but if the disk dies then the array is gone, if there are unrecoverable blocks then I'm not sure what happens. With regular RAID pretty much any failure will destroy the array.

1 Like

I'm using ZFS Z1 on my 4x8TB array. So far I've had no errors. I am worried about it however and I do plan to periodically swap out the drives to make sure that I have fresh ones in the system. I'm also going to build a cold storage server eventually that I can use for backups. As for the drives, the Seagate drives are SMR as opposed to PMR.

This is an issue if you plan to do more than archive things on them as SMR intentionally writes over other data blocks to squeeze more data onto the disk. This means that your potential to have errors goes up as the data MUST be written sequentially and the performance drops.

Storage review has a good writeup here: http://www.storagereview.com/node/4539

Unless I'm mistaken, both ZFS and BTRFS do integrity checking of all the files/parity so that if there are uncought errors in the drives they don't develop errors. As for how they work, a ZFS Z1 array is functionally the same as RAID 5(single drive parity), Z2 is RAID 6(dual drive parity) and Z3 is RAID 7(triple drive parity). SnapRAID sounds more like a JBOD to me. JBOD I really haven't used JBOD enough to know much about it other than it concatenates drives into a single volume. If I understand correctly, it could be easier to recover files off of a damaged JBOD than a RAID 0 as they aren't striped, but this isn't always true. I'll have to look into SnapRAID.

What I mean is I don't know how zfs and btrfs deal with unrecoverable sectors. With raid the disk will get dropped from the array but I'd like to think that btrfs and zfs would just skip the sector and keep recovering what they can.

snapraid isn't really like raid at all. It works on top of the file system and doesn't change anything on the disks, it just generates parity data so that if a disk fails it can be replaced. But the disks are all independent and still work the same way they normally would.

Unless I'm mistaken they will mark it as broken and then rebuild from parity. Is that what you mean? If the drive continues to fail it will continue to give you warnings but I don't believe it will outright drop the disk.

This is my scrub results.

Yeah, if there is parity available then it will fix the block, but I'm talking about a situation in which you have to rebuild the array and there is an unrecoverable sector while rebuilding. If the disk is dropped then the array is lost, but if it skips the sector and continues to recover then you only lose the data that relies on that sector. I'm not sure if this is the way zfs and btrfs work but I would assume that it is.

If the sector is lost then that is what the parity is there for... I don't get what you're getting at.

I mean if you've already lost a disk and have to rebuild, in that case there won't be any parity. If you have two disk parity then it's fine, I just mean if you're in a situation where you've already lost disks and one more failure will be too many. That's what the OP is worried about.

Oookay. I get you. I'm not sure about that. I believe that would cause damage but I really don't know. That's why you use Z2. lol

haha yeah.

If another disks totally fails then the array is gone, I'm just not sure what happens if it's only an unrecoverable sector. It should be possible to skip it and just deal with some file corruption, but I'm pretty sure with traditional RAID the disk will be dropped which will destroy the array. I'm pretty sure zfs and btrfs won't do anything that dumb.

4TB is the maximum I trust.
I like the WesternDigital Reds and WD Re drives I am running.
Because the price difference between blacks and the Re drives is too small to make a difference, I am thinking about throwing them into my next rig...

2 Likes

The main issue is simple really.
When a drive dies and gets replaced, the array is repaired from parity data. This repair takes a long time and puts a lot of stress on the drives. This means that there's a larger chance that another drive starts throwing up errors because it can't handle the stress.
The larger the drives get, the longer a rebuild will take and hence the bigger the chance of another drive crapping out.

Always stress-test your HDDs for at least 24 hours before putting them in an array.
Actually ... always stress-test your new HDDs, period.

My preferred method is to
- take a screenshot of the S.M.A.R.T. window (I use crystal disk info to get a S.M.A.R.T. readout)
- then do a 35-pass full disk wipe with ccleaner
- followed by a full error scan using hdtune (don't tick the "quick scan" checkbox, you want to do it thoroughly).
- If it gets a perfect report, check the S.M.A.R.T. numbers again and compare them.
If the drive makes it through all that (with 10TB drives you'll be talking days or close to a week, IIRC my 4TB ones took 40+ hours to undergo the whole procedure), it'll last a looooong time.
However if it starts throwing errors, be glad that you caught it before putting your data on it, because it wasn't going to last long.

1 Like

yeah so unless you have like
luck, and triple drive redundancy...
then your screwed pretty much above 4-5tb drives

yeah, this sounds what would be important
because i dont want a single block or error to happen during an array reconstruction
which ends up DESTROYING my WHOLE array because there is not enough parody to handle a second or third concurrent error

well thats my issue
MATHEMATICALLY
a hdd has an unrecoverable block or error each whatever trillion operations
and even to READ a drive over 5 TB that number of operations is met
therefore
during a full drive reconstruction on a RAID
that error is Mathematically 100% gunna happen if your drives are Larger than Say 4 or 5 TB each
statistically speaking

In ZFS and BTRFS, because of the calculations that are done to create a sort of software ECC for the hard drives, those read errors your concerned about theoretically should be corrected as they occur using the data off of other disks in the array. That is a massive, glaring issue with traditional raid. But, ZFS and BTRFS aren't applicable to the same kind of failure.

2 Likes

so a random hardware sector or block error wont be enough to fuck an array reconstruction over if it was already in a degraded state ?
@wendell ?

If its a double parity situation then no there should be enough data on the other disks to correct it without issue. At least that is my understanding, would be great to have this confirmed by @wendell

thats the issue
if your on Double parody (2 drives)

one drive fails and you need to reconstruct (down to 1x parody)

during that reconstruction. if your drives are Over 4-5tb in size
then even to READ the good drives to rebuild the array

the remaining drives will mathematically encounter an error reading that much data

that second error will kill a traditional raid5