Intel SFP+ transceiver seemingly high fail rate?

So over the past 3 months, I had 3 failures on SFP+ modules. Seems high to me…after the second one failed, I bought 5 spares. At about $15 a pop after tax, they’re not exactly cheap enough to be tossing one every month… Has anyone had similar experiences with used ebay SFP+ module? Should I just stay away from used modules and buy new? Any recommendations for replacement modules or are the models I have decent?

Is there an easy way to test a suspected failed module? I haven’t recycled the modules that failed yet in hopes that maybe they weren’t bad.

Hardware information…

Aruba S2500 PoE network switch.
Intel AFBR-703SDZ-IN2 SFP+ transceiver modules
Intel X520-DA2

Also I’m not entirely opposed to DAC cables… Just the way the rack is setup, they would be a real pain to run right now…plus I don’t currently have any :frowning:

Ouch, is it always the switch side? Is it always the same port? If yes it could be dirty switch power killing transceivers. fs.com has new modules for $20 a piece - they’re generic they just code them for whatever brand of device you need to use them in.

1 Like

Two failed switch side, one failed in the X520-DA2 on my PC.

I should mention that I am running 8 modules total. Initially I bought just 8. After 2 failures, I bought the other 5 and ran a second link to my TrueNAS box. The modules were from 2 sellers but identical model.

I will check out the site, thank you for that

I’m sorry you’re seeing such a high failure rate. If you do go looking for other brands and try to do some research on failure rate statistics on these parts I would be extremely skeptical of any reported numbers.

While working at a colo data center, I saw a lot of companies who’s troubleshooting steps when confronted with networking errors of any sort was:

  1. blame the cable
  2. blame the sfp
  3. repeat steps 1 and 2 in any order a few times
  4. do actual troubleshooting steps including checking if easily configurable software was configured properly.

I assume many manufacturers of switches and sfps will not take reports of high failure rates seriously when they know IT departments have habits like these.

1 Like

Are these 850nm OM3 or 1310 OS2 SFP’s?

Are they LR models?

Good advice. I am not 100% sure the modules completely failed. I am just not sure of a good way to test them without purchasing expensive equipment. I’m just using this in my home so not worth it. The configuration wasn’t changed. The link would just go down with activity lights unlit. As if I unplugged the fiber cable.

With the first issue, I tried the easiest thing first and worked my way up. I tried a warm boot then a full cold boot (shutdown and remove power). I then tried cleaning all of the optic connections with one of the LC clicking pens you can buy. After that I did a port swap on both the card and switch one at a time with no luck. I then swapped one module then the other. After swapping the switch side module, the connection came up immediately. The last few steps would have been a card swap, then a cable swap which I thankfully didn’t have to do…

1 Like

OM3 non LR.

0 reason for them to burn out faster… I got nothing, maybe poor ventilation?

1 Like

Maybe? I have the switch at top and front side of the rack. The switch is unmodified and only have optical modules. It is a side venting unit and the modules are warm but not hot. I can try to get some surface measurements if that could help.

Im trying to include as much detail as I can remember. One more thing is two of the modules that had issues were on each end of a 100 meter run to a PC that is powered on once or twice a week. The 3rd one was on one of the connections to the primary TrueNAS server that is on 24/7.

So far the issues happened in this order…

  1. PC node computer side,
  2. TrueNAS node switch side,
  3. PC node switch side.

Not really enough for me to see a trend. Unless there is a repeat issue with the same port, I wouldn’t know what else to look for.

I should also mention that cleaning the optical connections, I did remove the module to clean it. Then re-seat the module so it should take care of the “unplug it and plug it back in” step. The issue still persisted as if the module itself gave up…that is what lead me to the conclusion it failed.

I did some temp measurements for the switch with a non contact type thermometer. Room ambient approx 21c

Inlet highest temp 21.7c
Outlet highest temp 33.2c
Module highest temp 35.9c

I tried to measure on the sticker of the module since they’re shiny and could mess with the reading.

Screenshot_20210408_202713

seems downright frosty next to mine so I doubt that is it. Note the copper one will burn you if you hold it too long.

1 Like

Just wanted to update this, had one more failure since April. Tested all modules that failed. All but 1 still works so I cleaned them, marked them and will likely use them for non critical systems. If they fail again, they’ll go to recycling. I guess I’m just exceptionally unlucky