Multiple ConnectX-3 cards, no fiber light, no link light

I have an issue with my Mellanox ConnectX-3 cards where any transceivers don’t output any light.
The card is detected in Windows 11 and Ubuntu 22.10. Both ports are visible on both operating systems, but I can never get the transceivers to work, and the link lights never turn on.

I’ve updated the firmware and drivers to the latest versions. I’ve tested different firmwares. I’ve reinstalled the firmwares and drivers multiple times.
I’ve even tried two different ConnectX-3 cards, of the same model.

I’ve verified all the transceivers work and the fiber is good.
I can’t get any light from the transceivers. My light meter doesn’t detect anything.

Card Models: Mellanox MCX314A-BCBT
Driver Version: 5.50.14740.1
Firmware Version: 2.42.5000

Tested Transceivers: Cisco QSFP-40G-SR-BD, Cisco SFP-10G-SR, Mellanox MFM1T02A-SR
Fiber Type: LC-LC OM3 MMF

OS: Windows 11 Pro for Workstation 23493, Ubuntu 22.10
Tested Multiple PCIe slots as well.

I have no idea where to troubleshoot at this point. The cards seem to be detected fine, and even updated fine, so I don’t think they’re broken. The firmware release notes also specify it’s compatible with my 40Gig transceivers.

Update: Here’s some helpful command outputs:

Update: I ran the ethtool command here:

are you sure that is the latest firmware?

thats’s a MCX314A-BCCT

ethtool -i enp11s0 | grep -i firmware | cut -d ’ ’ -f 2-
2.42.5000

MCX314A-BCBT Version 2.42.5000

edit: ok you wrote 2.41.5000 but you are already on 2.42.5000…

My mistake! Typoed that firmware version. Fixed it.

Tried messing with the setup a lot more and still getting the same results.
The cards appear to turn on and are working, but any module I plug into them doesn’t ever turn on.

Cisco locks its transceiver firmware and switch ports. Don’t use Cisco unless you are plugging into Cisco brand hardware.
Mellanox works with pretty much every brand of transceiver that doesn’t pull shenanigans with their hardware, which means no Cisco and no Juniper. You can use Mellanox brand, generic brand coded, 10GTek off Amazon, FS brand, Chelsio, Intel, and whatever other dozens of brands out

I verified this specific transceiver was supported before buying them on the firmware’s release notes.
image

I’ve also seen forum posts of other users getting this transceiver to work with ConnectX-3 cards.

I had a 1Gig SFP module, JSM-12S0AA1 that I put into the QSFP+ to SFP+ adapter and I’m getting the exact same results with it. Doesn’t turn on at all.

If you can, try a copper wired connection. This tells you a lot on where you need to continue your fault-finding mission.

One more thing: I noticed the IPv4 addresses mentioned are not in the same subnet. That means they can’t see each other.

I appreciate the reply!

I have 10G SFP+ DACs coming Saturday that I’ll test with the QSFP+ to SFP+ adapters.

These appear to be just auto-generated IPs by windows. Once I get link working, I’ll be sure to get some IPs setup correctly.

UPDATE

I just got my hands on Mellanox MFM1T02A-SR modules.
I plugged those in using known-good QSFP+ to SFP+ adapters and they’re doing the same thing.
It’s like the card is on, but the interfaces are turned off.

under linux what does mlxlink say ?

Do you have have a diagram of the connection setup?

Regarding the QSFP-40G-SR-BD: That is a bidirectional transceiver, correct? You need to make sure that the wavelengths of RX and TX are switched from one transceiver to the other.

1 Like

This is on Debian 11

root@truenas[~]# mlxlink -d /dev/mst/mt4099_pci_cr0

-E- Device is not supported

EDIT: An Nvidia employee informed me that mlxlink only works on ConnectX-4+

Correct and a good catch. I’ve got the cables setup correctly, but I’m the trouble I’m having is the modules appear to receive no power at all. My light meter shows no light coming out of either fiber, and a phone camera doesn’t pick up any infrared, where it does with these plugged into anything else like a switch.

Here’s some other potentially helpful commands.

root@truenas[~]# mst status
MST modules:
------------
    MST PCI module loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mt4099_pci_cr0          - PCI direct access.
                                   domain:bus:dev.fn=0000:07:00.0 bar=0xdf600000 size=0x100000
                                   Chip revision is: 01
/dev/mst/mt4099_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:07:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 01
root@truenas[~]# mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX3
  Part Number:      MCX314A-BCB_Ax
  Description:      ConnectX-3 EN network interface card; 40GigE; dual-port QSFP; PCIe3.0 x8 8GT/s; RoHS R6
  PSID:             MT_1090110023
  PCI Device Name:  /dev/mst/mt4099_pci_cr0
  Port1 MAC:        e41d2d2cf340
  Port2 MAC:        e41d2d2cf341
  Versions:         Current        Available
     FW             2.42.5000      N/A
     PXE            3.4.0752       N/A

  Status:           No matching image found

I’ve also tried flashing MCX354A-FCB firmware, as suggested by someone, but nothing changed. (This is on Windows 11)

PS C:\Users\rojo8> mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX3
  Part Number:      MCX354A-FCB_A2-A5
  Description:      ConnectX-3 VPI adapter card; dual-port QSFP; FDR IB (56Gb/s) and 40GigE; PCIe3.0 x8 8GT/s; RoHS R6
  PSID:             MT_1090120019
  PCI Device Name:  mt4099_pci_cr0
  Port1 MAC:        7cfe90a8cdd0
  Port2 MAC:        7cfe90a8cdd1
  Versions:         Current        Available
     FW             2.42.5000      N/A
     PXE            3.4.0752       N/A

  Status:           No matching image found

169.254.x.x IP’s are link local failure to get IP fallback.

@Rojo, I just noticed in your screenshots you are in IP over Infiniband mode? Did you switch them both to Ethernet? If you bought Infiniband cards so they start in that mode automatically unless told to be ethernet. It has been a long time since I used CX-3 cards so maybe they say IPoIB when in Ethernet mode, IDK. Its a little confusing cause it says the adapter is friendly name “Ethernet #6”, but then it says it is in IB mode… When using Infiniband, even IPoIB, without an infiniband controller running on that network they won’t really do anything. I would think they should still light up the links at least anyway and then not function, but maybe they wont even do that without the controller when in IB mode.

edit: Or are you sure you flashed the right firmware? The one for the Ethernet model cards, and you didnt put Infiniband firmware onto them? I am really kinda thinking you don’t have the right firmware on them now. I just looked up the CX-3 card models and the dual mode IB/EN cards are QCBT and FCBT, but you have a BCBT which should be an Eth only NIC (XCAT is also an Eth only card), and yet you are running infiniband firmware on it. Maybe it is the same exact chip underneath, IDK. But looking at the IB and EN model brochure they seem like the EN cards are somehow different but the IB cards are both.

CX-3 EN brochure:

CX-3 VPI brochure:

Yeah, I manually set them to ethernet as well as IB. I also flashed Ethernet-only firmware onto them and got the same issue.
They modules seem to not power on with firmware MCX354A-FCB (VPI, but set to ETH) or MCX314A-BCB(ETH Only).

Thank you for taking the time to look at this!

1 Like

Update:

The 10Gig Mellanox modules I have in are detected. Was drawing a blank on the ethtool command and couldn’t add this earlier.

root@truenas[~]# ethtool -m enp7s0
        Identifier                                : 0x03 (SFP)
        Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
        Connector                                 : 0x07 (LC)
        Transceiver codes                         : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
        Transceiver type                          : 10G Ethernet: 10G Base-SR
        Encoding                                  : 0x06 (64B/66B)
        BR, Nominal                               : 10300MBd
        Rate identifier                           : 0x00 (unspecified)
        Length (SMF,km)                           : 0km
        Length (SMF)                              : 0m
        Length (50um)                             : 80m
        Length (62.5um)                           : 30m
        Length (Copper)                           : 0m
        Length (OM3)                              : 300m
        Laser wavelength                          : 850nm
        Vendor name                               : MELLANOX
        Vendor OUI                                : 00:02:c9
        Vendor PN                                 : AFBR-703SDZ-MX1
        Vendor rev                                : G2.3
        Option values                             : 0x00 0x1a
        Option                                    : RX_LOS implemented
        Option                                    : TX_FAULT implemented
        Option                                    : TX_DISABLE implemented
        BR margin, max                            : 0%
        BR margin, min                            : 0%
        Vendor SN                                 : AA1131A5WB0
        Date code                                 : 110805
        Optical diagnostics support               : Yes
        Laser bias current                        : 0.002 mA
        Laser output power                        : 0.0001 mW / -40.00 dBm
        Receiver signal average optical power     : 0.0001 mW / -40.00 dBm
        Module temperature                        : 36.80 degrees C / 98.25 degrees F
        Module voltage                            : 3.3069 V
        Alarm/warning flags implemented           : Yes
        Laser bias current high alarm             : Off
        Laser bias current low alarm              : Off
        Laser bias current high warning           : Off
        Laser bias current low warning            : Off
        Laser output power high alarm             : Off
        Laser output power low alarm              : Off
        Laser output power high warning           : Off
        Laser output power low warning            : Off
        Module temperature high alarm             : Off
        Module temperature low alarm              : Off
        Module temperature high warning           : Off
        Module temperature low warning            : Off
        Module voltage high alarm                 : Off
        Module voltage low alarm                  : Off
        Module voltage high warning               : Off
        Module voltage low warning                : Off
        Laser rx power high alarm                 : Off
        Laser rx power low alarm                  : On
        Laser rx power high warning               : Off
        Laser rx power low warning                : On
        Laser bias current high alarm threshold   : 10.500 mA
        Laser bias current low alarm threshold    : 2.500 mA
        Laser bias current high warning threshold : 10.500 mA
        Laser bias current low warning threshold  : 2.500 mA
        Laser output power high alarm threshold   : 2.0000 mW / 3.01 dBm
        Laser output power low alarm threshold    : 0.1260 mW / -9.00 dBm
        Laser output power high warning threshold : 0.7900 mW / -1.02 dBm
        Laser output power low warning threshold  : 0.3170 mW / -4.99 dBm
        Module temperature high alarm threshold   : 85.00 degrees C / 185.00 degrees F
        Module temperature low alarm threshold    : -5.00 degrees C / 23.00 degrees F
        Module temperature high warning threshold : 80.00 degrees C / 176.00 degrees F
        Module temperature low warning threshold  : 0.00 degrees C / 32.00 degrees F
        Module voltage high alarm threshold       : 3.6000 V
        Module voltage low alarm threshold        : 3.0000 V
        Module voltage high warning threshold     : 3.4600 V
        Module voltage low warning threshold      : 3.1300 V
        Laser rx power high alarm threshold       : 2.0000 mW / 3.01 dBm
        Laser rx power low alarm threshold        : 0.0315 mW / -15.02 dBm
        Laser rx power high warning threshold     : 0.7900 mW / -1.02 dBm
        Laser rx power low warning threshold      : 0.0315 mW / -15.02 dBm
root@truenas[~]# ethtool -m enp7s0d1
        Identifier                                : 0x03 (SFP)
        Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
        Connector                                 : 0x07 (LC)
        Transceiver codes                         : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
        Transceiver type                          : 10G Ethernet: 10G Base-SR
        Encoding                                  : 0x06 (64B/66B)
        BR, Nominal                               : 10300MBd
        Rate identifier                           : 0x00 (unspecified)
        Length (SMF,km)                           : 0km
        Length (SMF)                              : 0m
        Length (50um)                             : 80m
        Length (62.5um)                           : 30m
        Length (Copper)                           : 0m
        Length (OM3)                              : 300m
        Laser wavelength                          : 850nm
        Vendor name                               : MELLANOX
        Vendor OUI                                : 00:02:c9
        Vendor PN                                 : AFBR-703SDZ-MX1
        Vendor rev                                : G2.3
        Option values                             : 0x00 0x1a
        Option                                    : RX_LOS implemented
        Option                                    : TX_FAULT implemented
        Option                                    : TX_DISABLE implemented
        BR margin, max                            : 0%
        BR margin, min                            : 0%
        Vendor SN                                 : AA1131A5WAV
        Date code                                 : 110805
        Optical diagnostics support               : Yes
        Laser bias current                        : 0.002 mA
        Laser output power                        : 0.0001 mW / -40.00 dBm
        Receiver signal average optical power     : 0.0001 mW / -40.00 dBm
        Module temperature                        : 36.12 degrees C / 97.03 degrees F
        Module voltage                            : 3.3128 V
        Alarm/warning flags implemented           : Yes
        Laser bias current high alarm             : Off
        Laser bias current low alarm              : Off
        Laser bias current high warning           : Off
        Laser bias current low warning            : Off
        Laser output power high alarm             : Off
        Laser output power low alarm              : Off
        Laser output power high warning           : Off
        Laser output power low warning            : Off
        Module temperature high alarm             : Off
        Module temperature low alarm              : Off
        Module temperature high warning           : Off
        Module temperature low warning            : Off
        Module voltage high alarm                 : Off
        Module voltage low alarm                  : Off
        Module voltage high warning               : Off
        Module voltage low warning                : Off
        Laser rx power high alarm                 : Off
        Laser rx power low alarm                  : On
        Laser rx power high warning               : Off
        Laser rx power low warning                : On
        Laser bias current high alarm threshold   : 10.500 mA
        Laser bias current low alarm threshold    : 2.500 mA
        Laser bias current high warning threshold : 10.500 mA
        Laser bias current low warning threshold  : 2.500 mA
        Laser output power high alarm threshold   : 2.0000 mW / 3.01 dBm
        Laser output power low alarm threshold    : 0.1260 mW / -9.00 dBm
        Laser output power high warning threshold : 0.7900 mW / -1.02 dBm
        Laser output power low warning threshold  : 0.3170 mW / -4.99 dBm
        Module temperature high alarm threshold   : 85.00 degrees C / 185.00 degrees F
        Module temperature low alarm threshold    : -5.00 degrees C / 23.00 degrees F
        Module temperature high warning threshold : 80.00 degrees C / 176.00 degrees F
        Module temperature low warning threshold  : 0.00 degrees C / 32.00 degrees F
        Module voltage high alarm threshold       : 3.6000 V
        Module voltage low alarm threshold        : 3.0000 V
        Module voltage high warning threshold     : 3.4600 V
        Module voltage low warning threshold      : 3.1300 V
        Laser rx power high alarm threshold       : 2.0000 mW / 3.01 dBm
        Laser rx power low alarm threshold        : 0.0315 mW / -15.02 dBm
        Laser rx power high warning threshold     : 0.7900 mW / -1.02 dBm
        Laser rx power low warning threshold      : 0.0315 mW / -15.02 dBm

I’ve tried both the correct 314A-BCBT ETH-Only firmware and the 354A-FCB VPI firmware.
I’ve seen multiple people get the VPI firmware working on this card so I figured I’d give it a shot. I’m getting the same result on both firmware versions. Modules are detected but don’t output any light, and no link can be established.

I just posted the ethtool outputs here if that helps:

Everything looks good as far as being detected, which means it should be functioning correctly. I would try setting a static IP on both and see if the links “magically” light up after that. The only thing I have a concern on still is why in the screenshots in the original post it says it is an IPoIB adapter, when I feel like it shouldnt really say that when it is all switched to Ethernet mode. But I could be wrong on that part.

Sorry, yeah. The original screenshot is in Auto mode.
Here’s with them forced into ETH mode. I’ve switched them back and forth so many times I picked the wrong screenshot.
A static IP in the same subnet has been entered and they’re plugged into each other using the previously detected 10Gig SFP+ module.

There’s still no light coming out the module and there’s no link established.

Good news! Maybe…

My SFP+ 10gig DAC cables just arrived (10GTek CAB-10GSFP-P3M). I plugged them into my Mellanox ConnectX-3 QSFP+ 40Gig card using the QSFP to SFP adapters and link came right up.

So…
The card works. It can recognize and print out any transceiver I put in it, but it never puts light out of the transceiver. They stay off. That happens with BOTH Cisco and Mellanox transceivers.

I’m at a complete loss as to what would be causing this and what the next troubleshooting step would be.
This is happening to BOTH of my Mellanox 314A 40Gig cards on any motherboard and any operating system.

Motherboard Details
Currently using an NZXT N7 B550
It’s currently in the PCI-E 16x slot, with the version manually set to 3.0. It should be outputting the full 75 watts capable.
I’ve tried its 3.0 4x slot as well with the same results.

I’ve also tried an ASUS Z170 Pro

I don’t have my ASUS ROG STRIX X399-E GAMING motherboard available anymore, but it was doing the same thing as well in the top PCIe 16x slot.

Last week Linus (yeah, the Canadian one :stuck_out_tongue: ) put out a video on fibre connections. In order to get stuff working, Jake (LMG’s networking guy) had to change some settings, here’s the relevant part of video:

It may apply to your setup as well, perhaps?

Thanks for contributing!
Unfortunately, the problem I’m having is that the optics are detected, but not outputting any light.
But this’ll be good to verify once I get light out my optics.

The ConnectX-3 354A cards came in.
They’re doing the EXACT same thing as the 314A cards.
Works totally fine with DAC cables, but in any system at all, any fiber module plugged in never turns on its led/laser, and there’s never link light.

I’m at a complete loss on what to do from here anymore.