Supposivly 'Bad' HDD has no bad Sectors?

I had a Western digital 1 TB Re HDD (Enterprise class), that has popped up on the RAID controller as bad.

The drive was manufactured on December 9th, 2014; it is barely over three years old.

But I just finished a test with badblocks -nsv and it wasn’t able to find a single bad sector.

Checking for bad blocks in non-destructive read-write mode
From block 0 to 976762583
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

The test took over 50 hours to run? Is this drive actually bad?

what does the SMART data say?
did someone “adjust” the TLER settings or are they still set at factory? if at factory the drive may be waiting too long for rewrites and the controller sees that as a “timeout” and marks the drive bad.

https://en.wikipedia.org/wiki/Error_recovery_control

Here’s a screenshot of the smart data.

@wr250 No one has adjusted the TLER settings on the server is was in. We don’t modify stuff on the OS drive, it has had the same settings for 10 years. IT runs a custom proprietary linux OS for video surveillance (I hate it too, but thats the way it is).

the TLER is actually a drive parameter on the drive firmware you can set on WD black drives. it is not set on the server itself, as the OS (and all other hardware for that matter) used is irrelevant to the TLER setting.
check the SMART logs see if there is any clues there.
then try running both the short and long SMART tests if you haven’t already.

I shall run those tests.

So far what I think happened is since this is a very old raid card, its timeout is less than 7 seconds (the supposed TLER of the Re drives).

can you adjust the raid controllers timeout to be 8 seconds?

I am not sure, I have to take it down to check. I’ll do that now, be right back.

ITs an LSI 3ware 9750

There doesn’t appear to be an ability to adjust the timeout.

The closest thing I could find was a setting called `FlexRAID powerfail’. I can also set a cacheflush time but I don’t think that’s the correct setting either.

no thats not it.
while the smart tests are running look at this page:

the bsd commands will be the same on linux.
i would check the smart logs after the tests run before changing anything else.
i have to run out for a while, so i cant respond until then.

1 Like

Ouch, I don’t think this system has smartctl.

God I hope so, I haven’t checked. But the system is booting up now. I’ll be able to confirm soon.

Can confirm, it does not have that utility installed.

That’s fine, thanks for all your help. Its be highly useful.

did you check as root? silly question, i know but as a normal user you cant usually access smartctl. it must have some way of checking the smart data or you wouldnt have the above picture.

That drive was pulled from the system in question. I ran diagnostics on my laptop with a external bay dongle.

Yes, the only account on the system is root.