Bad sectors only increase when the drive is idle, but not under load - What's it doing?

Just a technical question out of curiosity. One of six five-year-old 10TB IronWolf drives appears to have had a minor head crash, the Synology NAS sent an email alert when the bad sector count increased and I’ve been monitoring it daily ever since. It still passes SMART and IronWolf health checks ironically enough, but I know it’s toast given how rapidly the bad sector counter is ticking up.

The strange thing is the bad sector count only increases when the drive is 100% idle. When fully idle, it begins making the same sound it would normally only make when under a maximum load scenario and it doesn’t stop, but not even the HDD activity light is blinking. Yet if the drive is put under an actual load and kept busy with work it goes quiet like normal and the bad sector counter stops ticking. What is up with this behavior?

I’ve been able to replicate the behavior on demand so it’s not a one-off or random chance. As long as I have a spare HDD plugged into a USB port on the NAS that I’m backing the NAS up to it keeps the drive quiet & the bad sector counter frozen. A copy transfer can last over 24 hours and it won’t tick the bad sector count up even once. Yet the moment a copy ends the drive gets loud again and after a few hours idle there’s a few hundred more bad sectors in the SMART info. What is the drive firmware doing that causes this behavior? So far just idling it’s ticked the bad sector count from the low hundreds to 1240 last I checked.

I don’t Know, but hypothetically, might you suspect physical damage near the parking slot?

Just throwing an idea out there.

Not researched or encountered.

Head collision in a stable device? What did you do to the poor disk? :slight_smile:

The number of bad sectors increases and this is rather normal, and redundant work especially when nothing needs to be done is also typical when the drive is trying to perform sector relocations.

I had more than one disk that had an avalanche of bad sectors and tried to recover / relocate them to such an extent that it killed itself to the point of disuse in the form of lack of responsiveness.

It is especially noticeable when the number has long exceeded the spare sectors and the disk starts to go crazy reading slow sectors that are actually almost dead.

What does the drive do when it clicks? It reads and writes sectors, the slow ones and tries to read the bad ones and tries to mark them and if it can move the data to other available ones and it does it in a free time, i.e. background type behavior.

This is usually a snowball effect, nothing will stop it and the situation will only get worse until the drive is no longer useable from a reasonable point of view.
The drive needs to have time to do its job and not directly affect work so it does so when it is free and the number of sectors is increasing as the drive becomes more and more aware of the situation after checking them and marks them as bad or pending relocation.

3 Likes

Nothing I swear! It’s just spinning inside the Synology. :stuck_out_tongue: A head collision was the only thing that made sense with my limited info. I suppose it’s possible a platter is beginning to delaminate or lose magnetism as another option.

The drive has gone from zero bad sectors to 1256 inside two weeks and it’s still busy doing whatever it’s doing when idle. Didn’t have any issues making a fresh full backup of the NAS, but once it’s idle the volume of 100% maxed out nonstop writes makes using the computer pretty irritating! There’s no clicking with it, it’s just constant nonstop maxed out writes internally within the drive itself. It could’ve written data to the full drive multiple times over by now. :grimacing:

You can consider this disk as lost… leave it for testing, because it makes little sense to use it.

h

Copy what you have to copy while it still works somehow and later you can do various things to it, even with…

Its simple - while idle from user request drive either performs housekeeping operations like SMR do, or some sort of self test is being run.

During this either:

  • bad sectors are discovered
  • rw errors are encountered and sectors are remapped, thus increasing erro count

Any slightly modern drive has control electronics more complex than your run of the mill 30 yerald workstation.

I don’t think housekeeping duties are supposed to last several days straight. I left it idle for well over 50 cumulative hours , but all it does is get super toasty and continue with the incessant self-writes. Especially since none of the other five drives in the array are doing this.

Finally just pulled it due to the noise, don’t want a ticking timer in my NAS array just in case any of the other Ironwolf’s also start acting funny before it can rebuild the pool. Frankly it seems to be doing what TimHolus said above and it was just killing itself in a death spiral. That HD Sentinel program looks fun, reminds me of SpinRite. Will have to play around with the demo version.

If those housekeeping ops consistently fail, then this could easily happen.

However this is impossible to verify, since we don’t know what is controller actually supposed to do, only observe the results of those operations.

Check if your model is SMR drive though.

Aye, it’s a shame there’s not more transparency on what’s going on with the controller. For all the vaunted IronWolf Health Management features I would’ve hoped/expected more. Was six Seagate Ironwolf Pro 10TB CMR drives, st10000ne0004.

Must’ve been something off with that drive from its manufacture, ever since I replaced it with an EXOS 20TB the entire NAS has been eerily quiet, even during the rebuild. Will have to wait for the next data scrubbing to really know, though.

I think you are literally hearing the helium fill difference. High capacity exos drives are I think all helium filled.

I observed the same sound difference with WD REDS and TOSHIBA MG drives.

Maybe for that single drive, but there’s still five identical IronWolf 10TB’s left in that NAS. It’s been eerie how quiet it is. That single drive had been responsible for most of the HDD noise in the entire NAS and I never realized it.

Final proof will be with a data scrub scheduled for next month as those have always been constant, loud writes for 24-36 hours at a time.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.