I really should replace this drive in my FreeNAS box

bsodmike · January 23, 2018, 3:33am

Couple of you here already asked me to replace this drive, but I’m just letting it go “until death do us part” given pool reports always come back healthy… another Multi-zone error logged.

WD Red HDD.

########## SMART status report for ada0 drive (Western Digital Red: WD-WCC7K7KKNRD6) ##########
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   204   167   021    Pre-fail  Always       -       4766
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       25
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       4208
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       19
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       14
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       14
194 Temperature_Celsius     0x0022   116   103   000    Old_age   Always       -       34
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Short offline       Completed without error       00%      4180         -

RileyM · January 23, 2018, 3:41am

How old is this drive, and what have you put it through?

anon54210716 · January 23, 2018, 4:12am

6 months operation, 30 or so spin ups. Still not a great deal of work for a HDD. Are you going to create a cron job, log all the smart parameters to a file and plot it’s demise after the fact?

BroadBandElf · January 23, 2018, 6:51am

buy a referb drive. better safe than sorry, unless it’s the 2nd part of a raid 1 array, then you could test is out a little longer to see whats actually wrong. either its a bad drive or a bad board.

bsodmike · January 23, 2018, 10:39am

Lifetime has been in the FreeNAS box.

bsodmike · February 13, 2018, 11:38am

@SgtAwesomesauce craziest thing ever - looked at the serial on the drive closest to the Plexi door… it’s the one with MultiZone Error rates.

Couple days ago it recorded 11… New HDD just arrived off Amazon. Now the question is whether I’d be able to do it without really having a backup anywhere else… eek!

Should I first setup a replication stack, replicate all existing data, THEN attempt resilvering the replaced drive? It’s only Raidz2.

SgtAwesomesauce · February 13, 2018, 5:55pm

Okay, it’s Z2, so you have 2 failure tolerance. That’s good.

How big are the drives and are you able to accept downtime?

bsodmike · February 13, 2018, 7:59pm

4TB each. ~6-12 hrs is tolerable?

SgtAwesomesauce · February 13, 2018, 8:09pm

I’d just do the resilver.

bsodmike · February 14, 2018, 10:55am

This might be an issue - I’m over the 80% recommended limit


########## ZPool status report summary for all pools ##########

+--------------+--------+------+------+------+----+--------+------+-----+
|Pool Name     |Status  |Read  |Write |Cksum |Used|Scrub   |Scrub |Last |
|              |        |Errors|Errors|Errors|    |Repaired|Errors|Scrub|
|              |        |      |      |      |    |Bytes   |      |Age  |
+--------------+--------+------+------+------+----+--------+------+-----+
|freenas-boot  |ONLINE  |     0|     0|     0|  2%|       0|     0|   20|
|big          ?|ONLINE  |     0|     0|     0| 87%|       0|     0|   16|
+--------------+--------+------+------+------+----+--------+------+-----+

???

Trooper_ish · February 14, 2018, 4:03pm

It’ll still resilver a new drive fine, but good time to consider relplacing all the drives with larger ones, or to replicate to a different zpool of higher capacity.
Personally I’d keep the array, and just cycle larger drives (remember to set the auto-resize if it’s not already turned on in free as)

bsodmike · February 17, 2018, 1:30am

What do you mean ‘cycle’ larger drives… WAIT is that a feature???

Take drive 0 out
replace with 10TB // Resilver // wait
replace drive 1 with 10TB // resilver // wait/
iterate till drive N is replaced.

Right?

bsodmike · February 17, 2018, 1:31am

@SgtAwesomesauce LOL I’m pretty sure you told me this and I missed it!!!

bsodmike · February 17, 2018, 1:32am

Where is the auto-resize option in V11 dude? ta

Trooper_ish · February 17, 2018, 1:38am

Yup, that is how cycling works, replace and resilver one drive at a time.
It’s slow (with 2-4TB drives) but works a charm. Over that, and not sure if the time will be worth it, or replicating better, but for now, that’s the easy, safe way

SgtAwesomesauce · February 17, 2018, 1:39am

Basically this.

Trooper_ish · February 17, 2018, 1:39am

With raidz2 you could technically replace 2 at a time, but you better have good backups

SgtAwesomesauce · February 17, 2018, 1:39am

technically

but with one failure already, I’m going to go ahead and say don’t risk it.

Trooper_ish · February 17, 2018, 1:50am

Thanks, and yeah, wouldn’t go near it myself…
I can’t find autoexpand=on in the Freenas GUI, I went to the shell, set the property, exported then imported =magic! But I only have version 9

bsodmike · February 19, 2018, 10:13pm

Brilliant thanks bro