New drive check

_dennis · April 24, 2024, 11:49pm

I just purchased 2 new large, and pricey, mechanical hdd. My plan is to experiment with mdadm and a raid 1 config. Iron wolf pro and exos.

What are the recommended / typical procedures to check the disks for functionality and to initially stress them to see if they fail?

And is it necessary given that they will be raid 1.

Dynamic_Gravity · April 25, 2024, 12:16am

I typically do a SMART long test, and if that checks out I write the entire drive with 0’s.

For the recent iron wolf drives I did, 16TiB, it took about 3-ish days for all that.

Some people also run further tests with bad blocks.

CC: @oO.o what do you do to put a used through its sanity paces?

_dennis · April 25, 2024, 6:52am

Thanks. I will look into bad blocks.

oO.o · April 25, 2024, 6:26pm

Yeah, SMART long test, badblocks, and then put it under some load for a while. I don’t have a regimented approach to load testing, just combine reads and writes. Use fio if you want.

That said I have never had a new drive fail SMART or had bad blocks out of the box. They’ve all either been DOA or fine. Did get 2 DOA 20tb exos the other day which was annoying but ultimately a free return/replace.

Dynamic_Gravity · April 26, 2024, 12:52am

That jives with the bathtub failure model.

TLDR is the failures occur either right from the start, or at the tail end. So anything in the middle is fine; which is why used drives are a great deal depending on where on the curve they sit (power on hours).

_dennis · April 27, 2024, 6:07am

Smart test reported no errors. But when I tried to perform a writing test with badblocks I got an error:

badblocks: Value too large for defined data type invalid end block (23437770752): must be 32-bit value

32-bit ?

file $(which badblocks)
/sbin/badblocks: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64...

Googling around found wiki.archlinux led to this. So I guess this utility is purposely hindered and no longer usable for larger hdds. And I did not see an alternative.

What about just zeroing out the entire drive(s). Then another long smart test?

And not to muddy the thread, but what will happen in a RAID 1 if one of the disks has a bad sector? How does mdadm inform me of the issue? Will it just neatly and automatically skip writing to the corresponding sector on the good disk?