Hey there!
Today, i woke up to a degraded raidz1 pool, which was still scrubbing. So, i stopped the scrub, zpool replace
ed the failed drive with the cold spare, and let it resilver. (It just finished resilvering as i’m typing this - now i’m scrubbing it again, to make sure everything is fine.)
I think i should’ve burned-in my HGST Ultrastars, before using them. It’s “only” a home server, but it runs the whole show with firewall/router and service VMs, and i’d be terribly unpleasant to lose any of my precious bits.
Back on topic, by burning 'em in prior to putting 'em into production, i would’ve satisfied the “Fail Fast” design pattern. In one of the many tech talks i’ve consumed, someone talked about failure modes of spinning storage - those either fail early, or die of old age, and there’s not much happening in between. And indeed mine failed at ~1100 Power_On_Hours
with reallocated sectors, unable to complete both, the short and the extended SMART self-test.
Since ZFS also keeps a record of drive performance, degrades the pool, and reports it, one might as well use this additional layer of diagnostics.
With compression on, dumping /dev/zero
into a file probably wouldn’t do much, other than rapidly incrementing a few counters, wihtout actually writing. Afaik, /dev/random or urandom
doesn’t have the throughput that might be desired. I think disabling deduplication, and duplicating a whole bunch of ISO images over and over might be what i’m looking for.
So, how do you burn-in your spinning storage?
…for how long?
…do you do a verification read test, with checksumming?