Can't update Mellanox firmware (ConnectX-6 Dx)

I’m running Windows 11 and want to upgrade the firmware for my Mellanox ConnectX-6 Dx. I just put it in my PC today, and have some issues.

There’s no way I can see to update the firmware. Running mlxup.exe as administrator, this is what I see:

Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX6DX
  Part Number:      --
  Description:
  PSID:
  PCI Device Name:  mt4125_pciconf0
  Base GUID:        N/A
  Base MAC:         N/A
  Versions:         Current        Available
     FW             --

  Status:           Failed to open device

---------
-E- Failed to query mt4125_pciconf0 device, error : FwInit has failed!

Press any key to continue ... (To suppress pause re-run with --sfx-no-pause flag)

I downloaded an up-to-date bin file but am not sure how to manually write those to the device.

In Device Manager, one adapter doesn’t seem to want to enable:

image

image

I also downloaded some other files from the driver site earlier this month, but I don’t remember what they are. I don’t feel comfortable installing them either:

image

I’m assuming one of these has the drivers.

After an OS restart, it’s now showing the other adapter with the yellow triangle.

image

Different error message this time.

Before

This device cannot start. (Code 10)

This device does not exist.

After

This device cannot start. (Code 10)

{Operation Failed}
The requested operation was unsuccessful.


After connecting fiber wires to both ends and my switch, I’m seeing this error: Rx Fault.

image

Since the other port isn’t functioning in Windows, I can’t really tell what’s going on. I’m wondering if this is a driver issue in Windows.

There’s no firmware information showing in Windows either.

image

What am I missing?

I can only query firmware information using the separate WinMFT installer:

C:\Windows\System32>mlxfwmanager -d mt4125_pciconf0
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX6DX
  Part Number:      MCX623102AN-GDA_Ax
  Description:      ConnectX-6 Dx EN adapter card; 50GbE; Dual-port SFP56; PCIe 4.0/3.0 x16
  PSID:             MT_0000000353
  PCI Device Name:  mt4125_pciconf0
  Base GUID:        00000002c911a4bd
  Base MAC:         0002c911a4bd
  Versions:         Current        Available
     FW             22.30.0001     N/A

  Status:           No matching image found

This seems to work and not work randomly.

Here’s what I did to update the firmware:

mlxfwmanager -d mt4125_pciconf0 -y -u -i "%userprofile%\Downloads\NVIDIA NIC drivers and firmware\fw-ConnectX6Dx-rel-22_36_1010-MCX623102AN-GDA_Ax-UEFI-14.29.14-FlexBoot-3.6.901.bin"

But that didn’t work. It appears the card is either flaky, or something else is wrong as it sometimes can read the firmware and sometimes not.

Examples of failures:

-E- Failed to query mt4125_pciconf0 device, error : Can not obtain Flash semaphore.
Device #1: Updating FW ...
Fail : FwInit has failed!
Device #1: Updating FW ...
Fail : Failed to query the device, All Before Boot2 - read error (Flash read failed at address 0x1000000 : MFE_CR_ERROR)
Device #1: Updating FW ...
Fail : Failed to query the FW - Err[12] - ME_ICMD_STATUS_SEMAPHORE_TO
Device #1: Updating FW ...
FSMST_INITIALIZE -   OK

Burning with DMA has failed, switching to Register-Access burn.
FSMST_INITIALIZE -   OK
Fail : ME_ICMD_UNKNOWN_STATUS

This one got the furthest and started writing the firmware but failed at 0%:

Device #1: Updating FW ...
Fail : Flash write failed: Flash write of 4096 bytes to address physical 0x1010 failed: MFE_CR_ERROR

It’s random which error I see though.

I looked at some troubleshooting which recommended to run this command:

> mcra -c mt4125_pciconf0
-I- PCI Semaphore cleared successfully.

I ran it 20 times, and one of the times failed. I don’t think that’s supposed to happen:

> mcra -c mt4125_pciconf0
-E- Failed to clear PCI semaphore for device: mt4125_pciconf0. General error

I have never seen ports randomly error out and not be able to start on any of my Mellanox cards. Even though none of mine are a 6 series specifically, it doesn’t seem like normal behavior.

I know in your other thread you were talking about using a NIC in a PCIe 1x slot, so is there any way you could try moving this NIC to one of your 8x slots and see if it decides to work? Maybe it is some interaction with only having 1 lane.

edit: never mind, I just read the new posts in your other thread and see you are in an 8x slot already. Id RMA the card, the behavior does not seem right. All of my cards boot right up and have never shown errors like this before.

I figured it out.

Moving it to the lowest x16 port on my motherboard fixed it (where I had a second GPU). That port is only x4 speeds, but that’s the exact amount needed for this card to reach close to maximum throughput on both 25Gb ports: 16Gb (PCIe 4.0 x1) x 4 lanes = 48Gb.

I had to remove the second GPU and replace those displays with a USB DisplayLink adapter. It’s not that big of a deal.


Separate issue: I’m having performance issues with SMB over fiber vs Ethernet.

Created a separate thread here: