QUESTION - Are SATA cables and trayless caddys more likely to fail than HDD's?

Hey all,

Hope we’re doing well?

I did a thread earlier in the week about a failed drive, I’ll link to this post at some point.

Anyway, turned out to be a SATA cable, or a trayless caddy. A bit annoying really, I went on auto-pilot and just assumed the drive was at the end of its life.

I have to admit, the sata cables I’ve been using are old ones. I would switch them all out, but then I might have a problem. I guess I could do one at a time, but the only time I know something is wrong is during a replication in TrueNAS.

This is my second caddy or sata related issue in the last year…makes me wonder about a better hard drive connecting solution.

Anyway, any thoughts welcome!

The ones I’m using

No, for the physical hardware there’s no distinction between what you are using and a solution with a backplane. Having said that, devices that use a backplane usually also implement a method of keeping the drive mechanically locked in position, so chances of intermittent contact are reduced. It’s that bouncing of the contacts in the connector that degrade them over time.

1 Like

Thanks Dutch, OK then I’ll keep with how I’m doing it and chalk it down to the usual maintenance. :+1:

Adding more points of failure increases the likelihood of failure, and cheaper parts tend to be made to a lower specification.
So, yeah, SATA cables can fail and are worth checking first before assuming the drive is the problem, if only because it’s so much cheaper of the cable is the issue. Even before that, try unplugging it, and plugging it back in again. :upside_down_face:

3 Likes

Thanks for that :+1: Yeah, ideally I’d have as few drives a possible, but the price per TB is a little wobbly at times :roll_eyes:

Hehe, the old unplugging/replugging trick, I wonder what millennium that will no longer be a thing! :laughing:

Hi,

I’d bet it’s just cables/connectors of meh / cheap quality.

(… coupled with a few specks of slightly acidic dust nicely doused in 60%+ humidity left to oxidize for a few months/years…).


Large aka. public, cloud providers work with inventories and statistics and bad cables happen to them too ...

Large aka. public, cloud providers will leave dead disks plugged in for a while - since dispatching humans to take hosts offline costs human time and host downtime. Eventually, a host will be taken offline and a new disk swapped in.

The swapped out disks could be cycled back into service in another host where they’d undergo online testing and erasure… Eventually the disk would either die for real and head to the crusher at some point or would keep serving happily in another machine.

All hardware is inventoried and tracked in a database, if another disk ends up unhappy in the same slot/position - that slot position will end up being left empty probably.

Similar thing happens with RAM - just because you have RAM errors doesn’t mean you’ll have errors on a different board and cpu, and it doesn’t mean you can’t blacklist the address range in RAM and keep using the dimm.

At scale it’s all about probabilities, and with your several hosts and a dozen+ disks, you’re starting to turn probabilities against us.

1 Like

Thank you for that risk :+1: Very interesting to read how larger places do things. I should probably track my trays and disks and cables more accurately.

Humidity is an issue though, that I’m trying to resolve as best as possible :roll_eyes:

I have bought some replacement SATA cables as spares, but it’s such a nightmare to test via a production machine, as it only shows errors during saturation (replications, etc.). So if there’s something strange going on, I guess I need to change the SATA cable first, then do a replication and hope there’s a lot of data. Then if that goes well, I need to swap the tray in case that’s the problem, and finally the disk. Bit of a palaver!

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.