[PROCEDURE] A drive on my NAS (TrueNAS), glad I'm familiar, so no panic or data loss!

Hey all!

I thought I’d write about this, in case it’s of interest.

As the moment I have 4 “servers” that are near enough all working happily:

No. 1 - is always on and for lots of uses (and has BackBlaze for selected file types)
No. 2 - Is occasionally on as a snapshot of No. 1, plus some data at rest.
No. 3 - Is also occasionally used to snapshot No. 2. as well as it’s data at rest
No. 4 - is another snapshot, but only does No. 1. (It has TrueNAS Scale, working quite well actually).

So No. 3 emailed me to say:

"Device /dev/gptid/715f5f39-cedb-11ec-bf59-b4969129bd24 is causing slow I/O on pool Pool3.
2022-10-27 13:35:11 "

Kurumba! Luckily, I have a cold spare. Unluckily the motherboard only has 6 SATA and my Pool consists 6 x 6TB RAIDZ2. Luckily I have a spare LSI HBA Card.

So I put it in, connect the spare, boot back up and replace. Now initially I had the drive reference as a dev:

I used glabel status to give me a usable name (ada0, etc.) and therefore a HDD serial number. It sort of worked…glad I restarted though, as the faulty drive had completely failed and went unavailable and it appeared that the drive I was going to pull, was a working one. Not quite sure how it went wrong there, perhaps the dev reference changes with each re-boot?

Aaaaaannyway, all booted up, go to storage, then pools, then click the cog (top right) and status.

interesting that it’s not the same reference, hey ho! Clicked the 3 dots to the right and ‘replace’ (after I’d wiped the replacement drive) and away we go with replacing the faulty drive.

Currently the resilver estimate has calmed down (can be weeks initially, or months), but it’s now 8 hours, which isn’t so bad for 6 x 6TB drives…some, I regret are re-used SMR drives.

Just thought I’d share. Once done, I’ll disconnect the HBA and remove…though I might leave it in there. I might forget where it’s gone though! :laughing: Next, I guess I should get myself a new spare :roll_eyes:

3 Likes

Oh, and I turned off all replications, SMART and snapshots, just to reduce the burden temporarily.