I have an issue with my Mellanox ConnectX-3 cards where any transceivers don’t output any light.
The card is detected in Windows 11 and Ubuntu 22.10. Both ports are visible on both operating systems, but I can never get the transceivers to work, and the link lights never turn on.
I’ve updated the firmware and drivers to the latest versions. I’ve tested different firmwares. I’ve reinstalled the firmwares and drivers multiple times.
I’ve even tried two different ConnectX-3 cards, of the same model.
I’ve verified all the transceivers work and the fiber is good.
I can’t get any light from the transceivers. My light meter doesn’t detect anything.
OS: Windows 11 Pro for Workstation 23493, Ubuntu 22.10
Tested Multiple PCIe slots as well.
I have no idea where to troubleshoot at this point. The cards seem to be detected fine, and even updated fine, so I don’t think they’re broken. The firmware release notes also specify it’s compatible with my 40Gig transceivers.
My mistake! Typoed that firmware version. Fixed it.
Tried messing with the setup a lot more and still getting the same results.
The cards appear to turn on and are working, but any module I plug into them doesn’t ever turn on.
Cisco locks its transceiver firmware and switch ports. Don’t use Cisco unless you are plugging into Cisco brand hardware.
Mellanox works with pretty much every brand of transceiver that doesn’t pull shenanigans with their hardware, which means no Cisco and no Juniper. You can use Mellanox brand, generic brand coded, 10GTek off Amazon, FS brand, Chelsio, Intel, and whatever other dozens of brands out
I just got my hands on Mellanox MFM1T02A-SR modules.
I plugged those in using known-good QSFP+ to SFP+ adapters and they’re doing the same thing.
It’s like the card is on, but the interfaces are turned off.
Do you have have a diagram of the connection setup?
Regarding the QSFP-40G-SR-BD: That is a bidirectional transceiver, correct? You need to make sure that the wavelengths of RX and TX are switched from one transceiver to the other.
root@truenas[~]# mlxlink -d /dev/mst/mt4099_pci_cr0
-E- Device is not supported
EDIT: An Nvidia employee informed me that mlxlink only works on ConnectX-4+
Correct and a good catch. I’ve got the cables setup correctly, but I’m the trouble I’m having is the modules appear to receive no power at all. My light meter shows no light coming out of either fiber, and a phone camera doesn’t pick up any infrared, where it does with these plugged into anything else like a switch.
169.254.x.x IP’s are link local failure to get IP fallback.
@Rojo, I just noticed in your screenshots you are in IP over Infiniband mode? Did you switch them both to Ethernet? If you bought Infiniband cards so they start in that mode automatically unless told to be ethernet. It has been a long time since I used CX-3 cards so maybe they say IPoIB when in Ethernet mode, IDK. Its a little confusing cause it says the adapter is friendly name “Ethernet #6”, but then it says it is in IB mode… When using Infiniband, even IPoIB, without an infiniband controller running on that network they won’t really do anything. I would think they should still light up the links at least anyway and then not function, but maybe they wont even do that without the controller when in IB mode.
edit: Or are you sure you flashed the right firmware? The one for the Ethernet model cards, and you didnt put Infiniband firmware onto them? I am really kinda thinking you don’t have the right firmware on them now. I just looked up the CX-3 card models and the dual mode IB/EN cards are QCBT and FCBT, but you have a BCBT which should be an Eth only NIC (XCAT is also an Eth only card), and yet you are running infiniband firmware on it. Maybe it is the same exact chip underneath, IDK. But looking at the IB and EN model brochure they seem like the EN cards are somehow different but the IB cards are both.
Yeah, I manually set them to ethernet as well as IB. I also flashed Ethernet-only firmware onto them and got the same issue.
They modules seem to not power on with firmware MCX354A-FCB (VPI, but set to ETH) or MCX314A-BCB(ETH Only).
The 10Gig Mellanox modules I have in are detected. Was drawing a blank on the ethtool command and couldn’t add this earlier.
root@truenas[~]# ethtool -m enp7s0
Identifier : 0x03 (SFP)
Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID)
Connector : 0x07 (LC)
Transceiver codes : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 10G Ethernet: 10G Base-SR
Encoding : 0x06 (64B/66B)
BR, Nominal : 10300MBd
Rate identifier : 0x00 (unspecified)
Length (SMF,km) : 0km
Length (SMF) : 0m
Length (50um) : 80m
Length (62.5um) : 30m
Length (Copper) : 0m
Length (OM3) : 300m
Laser wavelength : 850nm
Vendor name : MELLANOX
Vendor OUI : 00:02:c9
Vendor PN : AFBR-703SDZ-MX1
Vendor rev : G2.3
Option values : 0x00 0x1a
Option : RX_LOS implemented
Option : TX_FAULT implemented
Option : TX_DISABLE implemented
BR margin, max : 0%
BR margin, min : 0%
Vendor SN : AA1131A5WB0
Date code : 110805
Optical diagnostics support : Yes
Laser bias current : 0.002 mA
Laser output power : 0.0001 mW / -40.00 dBm
Receiver signal average optical power : 0.0001 mW / -40.00 dBm
Module temperature : 36.80 degrees C / 98.25 degrees F
Module voltage : 3.3069 V
Alarm/warning flags implemented : Yes
Laser bias current high alarm : Off
Laser bias current low alarm : Off
Laser bias current high warning : Off
Laser bias current low warning : Off
Laser output power high alarm : Off
Laser output power low alarm : Off
Laser output power high warning : Off
Laser output power low warning : Off
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Laser rx power high alarm : Off
Laser rx power low alarm : On
Laser rx power high warning : Off
Laser rx power low warning : On
Laser bias current high alarm threshold : 10.500 mA
Laser bias current low alarm threshold : 2.500 mA
Laser bias current high warning threshold : 10.500 mA
Laser bias current low warning threshold : 2.500 mA
Laser output power high alarm threshold : 2.0000 mW / 3.01 dBm
Laser output power low alarm threshold : 0.1260 mW / -9.00 dBm
Laser output power high warning threshold : 0.7900 mW / -1.02 dBm
Laser output power low warning threshold : 0.3170 mW / -4.99 dBm
Module temperature high alarm threshold : 85.00 degrees C / 185.00 degrees F
Module temperature low alarm threshold : -5.00 degrees C / 23.00 degrees F
Module temperature high warning threshold : 80.00 degrees C / 176.00 degrees F
Module temperature low warning threshold : 0.00 degrees C / 32.00 degrees F
Module voltage high alarm threshold : 3.6000 V
Module voltage low alarm threshold : 3.0000 V
Module voltage high warning threshold : 3.4600 V
Module voltage low warning threshold : 3.1300 V
Laser rx power high alarm threshold : 2.0000 mW / 3.01 dBm
Laser rx power low alarm threshold : 0.0315 mW / -15.02 dBm
Laser rx power high warning threshold : 0.7900 mW / -1.02 dBm
Laser rx power low warning threshold : 0.0315 mW / -15.02 dBm
root@truenas[~]# ethtool -m enp7s0d1
Identifier : 0x03 (SFP)
Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID)
Connector : 0x07 (LC)
Transceiver codes : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 10G Ethernet: 10G Base-SR
Encoding : 0x06 (64B/66B)
BR, Nominal : 10300MBd
Rate identifier : 0x00 (unspecified)
Length (SMF,km) : 0km
Length (SMF) : 0m
Length (50um) : 80m
Length (62.5um) : 30m
Length (Copper) : 0m
Length (OM3) : 300m
Laser wavelength : 850nm
Vendor name : MELLANOX
Vendor OUI : 00:02:c9
Vendor PN : AFBR-703SDZ-MX1
Vendor rev : G2.3
Option values : 0x00 0x1a
Option : RX_LOS implemented
Option : TX_FAULT implemented
Option : TX_DISABLE implemented
BR margin, max : 0%
BR margin, min : 0%
Vendor SN : AA1131A5WAV
Date code : 110805
Optical diagnostics support : Yes
Laser bias current : 0.002 mA
Laser output power : 0.0001 mW / -40.00 dBm
Receiver signal average optical power : 0.0001 mW / -40.00 dBm
Module temperature : 36.12 degrees C / 97.03 degrees F
Module voltage : 3.3128 V
Alarm/warning flags implemented : Yes
Laser bias current high alarm : Off
Laser bias current low alarm : Off
Laser bias current high warning : Off
Laser bias current low warning : Off
Laser output power high alarm : Off
Laser output power low alarm : Off
Laser output power high warning : Off
Laser output power low warning : Off
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Laser rx power high alarm : Off
Laser rx power low alarm : On
Laser rx power high warning : Off
Laser rx power low warning : On
Laser bias current high alarm threshold : 10.500 mA
Laser bias current low alarm threshold : 2.500 mA
Laser bias current high warning threshold : 10.500 mA
Laser bias current low warning threshold : 2.500 mA
Laser output power high alarm threshold : 2.0000 mW / 3.01 dBm
Laser output power low alarm threshold : 0.1260 mW / -9.00 dBm
Laser output power high warning threshold : 0.7900 mW / -1.02 dBm
Laser output power low warning threshold : 0.3170 mW / -4.99 dBm
Module temperature high alarm threshold : 85.00 degrees C / 185.00 degrees F
Module temperature low alarm threshold : -5.00 degrees C / 23.00 degrees F
Module temperature high warning threshold : 80.00 degrees C / 176.00 degrees F
Module temperature low warning threshold : 0.00 degrees C / 32.00 degrees F
Module voltage high alarm threshold : 3.6000 V
Module voltage low alarm threshold : 3.0000 V
Module voltage high warning threshold : 3.4600 V
Module voltage low warning threshold : 3.1300 V
Laser rx power high alarm threshold : 2.0000 mW / 3.01 dBm
Laser rx power low alarm threshold : 0.0315 mW / -15.02 dBm
Laser rx power high warning threshold : 0.7900 mW / -1.02 dBm
Laser rx power low warning threshold : 0.0315 mW / -15.02 dBm
I’ve tried both the correct 314A-BCBT ETH-Only firmware and the 354A-FCB VPI firmware.
I’ve seen multiple people get the VPI firmware working on this card so I figured I’d give it a shot. I’m getting the same result on both firmware versions. Modules are detected but don’t output any light, and no link can be established.
I just posted the ethtool outputs here if that helps:
Everything looks good as far as being detected, which means it should be functioning correctly. I would try setting a static IP on both and see if the links “magically” light up after that. The only thing I have a concern on still is why in the screenshots in the original post it says it is an IPoIB adapter, when I feel like it shouldnt really say that when it is all switched to Ethernet mode. But I could be wrong on that part.
Sorry, yeah. The original screenshot is in Auto mode.
Here’s with them forced into ETH mode. I’ve switched them back and forth so many times I picked the wrong screenshot.
A static IP in the same subnet has been entered and they’re plugged into each other using the previously detected 10Gig SFP+ module.
There’s still no light coming out the module and there’s no link established.
My SFP+ 10gig DAC cables just arrived (10GTek CAB-10GSFP-P3M). I plugged them into my Mellanox ConnectX-3 QSFP+ 40Gig card using the QSFP to SFP adapters and link came right up.
So…
The card works. It can recognize and print out any transceiver I put in it, but it never puts light out of the transceiver. They stay off. That happens with BOTH Cisco and Mellanox transceivers.
I’m at a complete loss as to what would be causing this and what the next troubleshooting step would be.
This is happening to BOTH of my Mellanox 314A 40Gig cards on any motherboard and any operating system.
Motherboard Details
Currently using an NZXT N7 B550
It’s currently in the PCI-E 16x slot, with the version manually set to 3.0. It should be outputting the full 75 watts capable.
I’ve tried its 3.0 4x slot as well with the same results.
I’ve also tried an ASUS Z170 Pro
I don’t have my ASUS ROG STRIX X399-E GAMING motherboard available anymore, but it was doing the same thing as well in the top PCIe 16x slot.
Last week Linus (yeah, the Canadian one ) put out a video on fibre connections. In order to get stuff working, Jake (LMG’s networking guy) had to change some settings, here’s the relevant part of video:
Thanks for contributing!
Unfortunately, the problem I’m having is that the optics are detected, but not outputting any light.
But this’ll be good to verify once I get light out my optics.
The ConnectX-3 354A cards came in.
They’re doing the EXACT same thing as the 314A cards.
Works totally fine with DAC cables, but in any system at all, any fiber module plugged in never turns on its led/laser, and there’s never link light.
I’m at a complete loss on what to do from here anymore.