Supposivly 'Bad' HDD has no bad Sectors?

Dynamic_Gravity · February 11, 2018, 11:56am

I had a Western digital 1 TB Re HDD (Enterprise class), that has popped up on the RAID controller as bad.

The drive was manufactured on December 9^th, 2014; it is barely over three years old.

But I just finished a test with badblocks -nsv and it wasn’t able to find a single bad sector.

Checking for bad blocks in non-destructive read-write mode
From block 0 to 976762583
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

The test took over 50 hours to run? Is this drive actually bad?

wr250 · February 11, 2018, 1:08pm

what does the SMART data say?
did someone “adjust” the TLER settings or are they still set at factory? if at factory the drive may be waiting too long for rewrites and the controller sees that as a “timeout” and marks the drive bad.

https://en.wikipedia.org/wiki/Error_recovery_control

Dynamic_Gravity · February 11, 2018, 1:12pm

Here’s a screenshot of the smart data.

@wr250 No one has adjusted the TLER settings on the server is was in. We don’t modify stuff on the OS drive, it has had the same settings for 10 years. IT runs a custom proprietary linux OS for video surveillance (I hate it too, but thats the way it is).

wr250 · February 11, 2018, 1:35pm

the TLER is actually a drive parameter on the drive firmware you can set on WD black drives. it is not set on the server itself, as the OS (and all other hardware for that matter) used is irrelevant to the TLER setting.
check the SMART logs see if there is any clues there.
then try running both the short and long SMART tests if you haven’t already.

Dynamic_Gravity · February 11, 2018, 1:37pm

I shall run those tests.

So far what I think happened is since this is a very old raid card, its timeout is less than 7 seconds (the supposed TLER of the Re drives).

wr250 · February 11, 2018, 1:38pm

can you adjust the raid controllers timeout to be 8 seconds?

Dynamic_Gravity · February 11, 2018, 1:38pm

I am not sure, I have to take it down to check. I’ll do that now, be right back.

ITs an LSI 3ware 9750

Dynamic_Gravity · February 11, 2018, 1:53pm

There doesn’t appear to be an ability to adjust the timeout.

The closest thing I could find was a setting called `FlexRAID powerfail’. I can also set a cacheflush time but I don’t think that’s the correct setting either.

wr250 · February 11, 2018, 2:07pm

no thats not it.
while the smart tests are running look at this page:

the bsd commands will be the same on linux.
i would check the smart logs after the tests run before changing anything else.
i have to run out for a while, so i cant respond until then.

Dynamic_Gravity · February 11, 2018, 2:09pm

Ouch, I don’t think this system has smartctl.

God I hope so, I haven’t checked. But the system is booting up now. I’ll be able to confirm soon.

Can confirm, it does not have that utility installed.

That’s fine, thanks for all your help. Its be highly useful.

wr250 · February 11, 2018, 4:59pm

did you check as root? silly question, i know but as a normal user you cant usually access smartctl. it must have some way of checking the smart data or you wouldnt have the above picture.

Dynamic_Gravity · February 11, 2018, 8:22pm

That drive was pulled from the system in question. I ran diagnostics on my laptop with a external bay dongle.

Yes, the only account on the system is root.