So, I did a firmware upgrade on my G292-Z20 to version 12.61.21 and now it won’t boot, I doesn’t even get to the BIOS screen.
I can reach it on BMC and the log says:
“Unknown sensor of type bios_post_progress logged a bios : System Firmware Error (POST error)(No video device detected) was asserted”
I’ve seen a couple of videos/post saying to disconnect the monitor before booting. That didn’t work.
Not too experienced on this, so just a guess, but it seems like it is trying to put video through bmc and failing, and claiming that its the fault of the firmware. Fw might’ve gotten corrupted or something along the way, but since you said that there have been other posts on this I am not so sure. If you want to give the latest fw another shot, go back to their page and redownload the firmware to make sure it didn’t get bunged up, then flash it again and see if it worked. If that doesn’t work and you don’t have need for new features/fixes in the new fw, just go back to the known working fw that you were using before.
Might also be relevant, it seems that the g292-z20 has a jumper for bios recovery (page 55 in the manual). Unsure how you’d use it (cant find anything on it) but it’s there. Could contact gigabyte about this issue.
Thx.
Didn’t know about the jumper I’ll check that out.
I already opened a ticket with Gigabyte, but haven’t heard back yet.
I tried upgrading the BIOS through the BMC to the newest version, but I gave me a verification error (downloaded twice).
I’ll try reinstalling the FW, but I haven’t figured out how to do that through the BMC yet.
Bmc can cause a whole wackload of errors for servers if the firmware is wrong. This is why you update firmware before bios if at all possible. In this case, it seems to be preventing startup. Roll back the bmc firmware if you cannot get latest to work. Additionally, I found a single post that was probably the one you were referencing in your initial post. It came from a single instance with a different gigabyte board. There doesn’t appear to be a trend here as far as I can see, though I could be wrong.
If that board uses megarac spx then it should be as simple as downloading the bmc firmware from gigabytes website, logging into bmc, going to maintenance, and finding the option for firmware update. Cant remember if it was its own option or a subset of the update section. Hand it the update and let it run. If you cannot find it, send me a screenshot of the maintenance page of the bmc and I will help you identify it.
Heres what it should look like. This is after uploading the fw iirc:
We frequently have issues with BMC firmware upgrades requiring multiple attempts.
If I recall, the best way is through Windows on SuperMicro and extracting the .bin for Gigabyte.
You then have to reset the machine.
Followed by no boot device found
followed by reset into UEFI to confirm upgrade occured
We just had this issue with 9000 series EPYC boards.
Before anyone gets high and mighty, SuperMicro is arguably worse with the 9000 series EPYC’s.
Thx for all the great suggestions, unfortunately it’s only gone down hill since my last post
Here’s what I did:
Disconnected all GPUs, boot w/wo monitor. No change
Reset FW in BMC, boot w/wo monitor. It lost it’s fixed IP and doesn’t ask for a new one over DHCP. I dumped the traffic, nothing happens. Now I can’t reach it over BMC anymore.
Tried inserting a bootable UEFI USB and mashing buttons while booting. Nothing.
Wanted to try the jumper recovery. That jumper doesn’t exist on my motherboard.
I don’t get to post.
I’ll try to monitor all the traffic from the server, I only captured DHCP traffic, there might be an IP in there.
Good idea trying to add a normal GPU, need to find one though.
No bmc access anymore? Crap. And no bios recovery jumper where it seems on the manual… Different revision then. Get the model number of the motherboard and find out where stuff is. Find that bios jumper and give em a general chat/call or something to pry the way to use it out of em. If there’s a vga or bmc disable jumper anywhere on the board (use ctrl f) then give those a try with the quadro/workstation card like the other user recommended. Let me know if you need one, I have some spare gt710s and some nvs 295 you can borrow if youre willing to pay the shipping.
If no luck with support or the jumpers, then have you tried getting back too basics? If youre willing to do it haha. Unplug literally everything that isnt necessary for it to run. Single stick of ram on the first channel, cpu, no cards or anything on pcie. Keyboard and monitor should be the only things that are plugged in. Use a monitor that has vga on it and the board, do not use video adapters if youre using them at all, since vga to hdmi or displayport doesnt work with bios. See if it has a speaker jumper anywhere on the board and plug it in so you know if it is doing something. Unplug power and remove battery, and clear the cmos with the contacts for 10 secs. Put battery back in afterwards, plug in, turn on, wait for monitor response if any. Give it like 15-20 mins and just wait, sometimes servers take their sweet time when they want to do health checkups and stuff at random. Sit there and play on your phone and read emails while you wait for the thing to put something on the screen. If it doesnt work then you need bmc back or the jumper for bios recovery.
My guess is that bios recovery jumper makes it pull a binary from one of the usb ports. If it end up having that jumper somewhere else, then you could just try giving it what these things usually want. Grab a small (Smaller is better. Aim for 2gb, dont go over 32gb) flash drive and format it in fat32. Put the firmware binary on it on the very top level. Enable bios jumper and go ham. Start up and wait a long time. Might be a specific port if it does do this, and iirc some even want the volume it pulls the fw from to be named in a certain way.
Sorry, this issue sucks. As soon as youre able to regain access you need to get that bmc upgraded back to working fw at all costs. Good luck.
Edit: seems like mobo manuals have identical layout for jumpers Manual a00 Manual 1.0
left jumper here is CLR_CMOS.
Left side is normal, moving the jumper to pins 2-3 is reset.
Not sure why the labels on the board are so far off to the left.
Might have helped with POST issues originally, unlikely to help now your BMC isn’t booting
The flash chips on these boards are socketed. Do you have access to a device you can use to dump and write those?
If you can get a Linux system booting without the BMC working (turn off wait for BMC in UEFI) you can use gigaflash64 to flash the BMC from Linux.
I got it back up and running!
And…I can brick and unbrick it at will now!
I actually got a reply from Gigabyte support in the mean time, telling me to try to reinstall the firmware, which was failing with the “Image not verified error”.
Let me explain what’s going on.
First the BMC DHCP access. I tried connecting it to two different DHCP servers on the network and couldn’t get a lease.
The I plugged in the laptop and could see the discover requests, ran a DHCP server of the laptop and it worked.
Can’t tell you why it didn’t work with the regular DHCP servers, haven’t looked into it, could be some firewall rules, messing with me.
Second, the bricking of the server. Through BMC I managed to update the BIOS. I just decided to work this one time, even if it had failed previously.
That fixed it! So I though it was a mismatch between BIOS and Firmware.
No. I started mocking around in the BIOS, because I can’t get my GPUs to run (I’ll do a separate post on that) and suddenly it was bricked again.
Thank you @Lunaa for pointing out the CLR_CMOS, this is what I did. Now it was unbricked.
The offending setting in the BIOS: PCIe Compliance Mode
Turning this on, bricks the server.
And I only found these two ways of clearing it: (re)install BIOS or use jumper. Since you can’t get to the BIOS settings, even through the BMC. It will show them to you, but it won’t save them.
Thank you everybody for your replies. I hope this answer helps anyone who runs into the same problem.