What is the best method for checking the health of mechanical hard drives (2023 edition)

Like the title states, what do you think is the best method for checking the health of your mechanical drives?
Do you use strictly software based solutions or do you have other offline methods that you also use?
Do you tend to stick to manufacturer provided software or do you think there are better third party tools out there?
How many of you just run an extended S.M.A.R.T. then YOLO it?

For new drives, I use smart and a few destructive writes with badblocks to test them.

Once drives are in use, I Just use smart/monitoring to let me know of failures. Most of my storage is in raid array with hot spares, so the fail over is automatic. And yes, I have an offsite backup.

Software wise, I’m using smartmontools, mdadm, and munin for monitoring. The volumes are formated ext4. It’s very boring and un-sexy.

3 Likes

Run dd over the disk (one write, on read), smart long test (smartmon-tools) and you’re fine.

Could you explain this part a bit more in detail?

When I purchase new HDDs, I always use this very nifty badblocks wrapper to test them. It does 4 full write and read passes on the whole drive, which will uncover if there are any bad sectors and hopefully will prove if they are problematic or not while they are still within the return period. After the badblocks tests complete, I run an extended SMART test. If no issues appear after both of these, I am confident that the drive is worth keeping.

For SSDs, I start by performing an extended SMART test. If no issues are found there, I format the drive with ZFS and copy a few GB of data to it and then run a scrub. If no issues are found in the scrub, I keep it.

Once a drive is in use, I setup a cron job on the host system to run an extended SMART test bi-yearly. For SSDs, this cron job also includes a notification if the wearout indicator is showing less than 25% remaining. All of the drives holding important data are running ZFS and have multiple layers of redundancy, and the host systems are setup to notify me if any ZFS errors are found. Less important drives are all running either BTRFS or ext4 in single disk configurations, usually holding OS installations. I am not so worried about these because I don’t store anything important on them (cattle, not pets). I always keep a spare of every size & format of disk that I use on hand so that I can drop in a replacement ASAP if any issues come up.

2 Likes

Something like badblocks -wsv -b 4096 -o <logfile> <device>

It actually writes to the drive (RIP data) and reads it to confirm. The default is non-destructive read only.

1 Like

For the ide, eide, and scsi drives use of dban will clean the drive and verify its condition.
Sata drives would often flag as damaged, but were still ok.
Smart works on them.
But many distros have disk analytics and do quite well in reporting conditions.
Old stuff i know but still valid

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.