6700XT - reset bug?

What is the motherboard make/model that you are using?

Also, do you see ANY output at all on the screen when starting up with “video=efifb:off” or is the screen completely blank?

Hey, I’m using a B550 Aorus Pro from Gigabyte.
The monitor which is connected to the gust GPU is completely blank until I start a VM, even at boot time. Any other monitor that is connected to my host GPU obviously shows some stuff.

I’m also using a Gigabyte motherboard, the X570S Aero G.

I am thinking this may be related to the motherboard and not the GPU. As others have pointed out, BIOS settings can be important but no matter what I tried it has failed.

When I boot up I see “EFI stub: loaded initrd from command line option” but nothing else after that. I see it across both screens (I have the 6700XT alongside an RX550 each with a scren connected) .

2 Likes

If anyone is still following this thread, I have a new development. An X.org developer on phoronix claimed the 6000 series support SBR and that it should be possible to properly reset 6700XTs using that method.

I have tried (unsuccessfully so far), but perhaps someone else may be more lucky and get it to work for them? If you want to give it a go, I’ve posted instructions on THIS reddit thread to help reach more of an audience. I’d appreciate some feedback from people that try it, on whether it helped resolve this for them.

1 Like

Hope? AMD Looking To Improve The GPU Reset Experience Under Linux

looks like thats for amdgpu, not vfio.

Yeah, it’s also about reporting on a reset after a problem occurred due to some buggy application. Not what we’re after…

@Mechanical Do you by any chance have a post in forums (reddit / level1techs) where you discussed your issue with the Asus Dual RX 6700XT along with any steps you took to debug/fix it? Something like this thread, but that you created to discuss your own case?

Got some attention from AMD employees that have generally been very active in Reddit forums.

Hopefully they’ll raise awareness internally in the company…

1 Like

Further update. I contacted a store and agreed I can order cards and test them at home, then return them if they don’t work and order another model.

If you are in the UK I highly recommend OCUK as explained here, where I have compiled a list of cards that are reported working, along with cards reported as NOT working.

In short: my configuration was 100% correct. I found two cards [EDIT: two 6000 series cards, plus my older RX550 and RX480 with vendor-reset patch] that “just work” as opposed to my Gigabyte Aorus Elite 6700XT which refused to work no matter what.

Clearly there’s something vendor/board-specific that causes this. Please refer to my Reddit post for a list of reported working/non-working cards before you purchase.

I personally highly recommend OCUK for a no-fuss experience if you are in the UK.

1 Like

I can confirm the XFX 6700xt SWFT is affected by this also.

1 Like

I have the same card, but non XL version and same problem.

For any one else considering it don’t get the XFX RX6700 SWFT 309 (Model Number RX-67XLKW)

I had some similar issue with my gigabyte rx580 being on the 2nd slot on Asus Strix x570-f. I’ve created a repo how I managed to fix it. Viktor Koteski / proxmox-gpu-reset · GitLab

1 Like

I have a Sapphire Nitro+ 6800XT and no issues at all.
in the past you could flash a bios from another vendor, but I don’t think it works with RDNA2

Edit: looks like there is a flash tool for RDNA2

Maybe it’s worth a shoot to test another Bios only by loading it via libvirt

https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF

> <hostdev>
>      ...
>      <rom file='/path/to/your/gpu/bios.bin'/>
>      ...
>    </hostdev>

I have a 6700XT- I followed this obscure post about flashing vbios with a specific one and it worked! amdgpu: GPU reset for Radeon RX 6700XT completely broken (#2709) · Issues · drm / amd · GitLab

2 Likes

THIS IS HUGE!

Never in my life thought I’d be able to get my Asus RX 6700 XT Dual OC working with a reset, in fact I have tried flashing different card VBIOSes previously with so-called “working” 6700 XT cards, but nothing have worked. I am legit flabbergasted that this specific card with a VBIOS from 2022 worked.

Let me just say this again:

IT WORKS

I want to kiss these guys. Every single one of them.

I can’t say I have used this for long as I’m posting this immidiately after attempting this and using the VM for a few minutes, rebooting 5 times and testing stuff out. So keep this in mind. Things may go haywire, but so far so good. Will report back in a week.

Any thoughts on this “solution” @gnif @wendell ? It really seems to be the GPU vendor screwing up the VBIOS or something since that actually “solves” the issue? Or am I talking out of my ass?

As for a guide how to do it:

First off you need to check your card’s memory vendor and you need amdvbflash. I used version 4.71.
You should be able to find your memory vendor by checking GPU-Z in Windows using the stock VBIOS on your current card. Pass it through to your WIndows guest as usual and check GPU-Z for further information.

The memory vendor and chip needs to be one of the following:

12288 MB, GDDR6, Hynix H56G42AS8DX014
12288 MB, GDDR6, Micron MT61K512M32C
12288 MB, GDDR6, Samsung K4ZAF325BM

If it isn’t you’re taking a gigantic risk flashing anything.

Next up is dumping the VBIOS. You can also do that through GPU-Z, but I’d do it through Linux as well to ensure the VBIOS is matching. You need to turn off VFIO anyway to flash a VBIOS. Comment out vfio-pci binding the GPU at boot like in /etc/modprobe.d using # and rebuild initramfs using something like mkinitcpio -P and rebooting. The GPU needs to be taken by amdgpu module.

The command for dumping the BIOS using amdvbflash is:

# ./amdvbflash -s 0 MyOriginalVBIOS.rom

You can also just dump it through sysbus:

# cat /sys/bus/pci/devices/0000:0X:00.0/rom  

Where the X is the hexadecimal digit for where your GPU is located. You can find it using lspci -vvv . You probably know yours already.
Dump multiple version of your VBIOS and ensure their checksum are the same so that you don’t sit there with a corrupted VBIOS backup and put it on another machine/cloud to ensure you have it around.

As everyone points in every guide about these kind of things, but it needs to be said anyway:

**Flashing any VBIOS, especially cross-vendor like this, will 100% destroy your warranty and you might sit there with a brick. **

Use caution and remember it is your own fault if it never boots up ever again. You should have dual-BIOS possibilities on your card to be safer. Single-BIOS is very dangerous. You have been warned.

So to flash the thing, get the working VBIOS from here: VGA Bios Collection: Sapphire RX 6700 XT 12 GB | TechPowerUp
The MD5 for this should be bbcf8fd1e226609094cd2283b3ea2259

Next up you need to flash the thing.
The -p 0 here means card 0, make sure you’re using the correct card and not your host’s card!!!

# ./amdvbflash -i # Check that 0 is the card you want to flash first!
# ./amdvbflash -fs -fp -fv -p 0 ./249630.rom # This is the actual flashing

That will essentially flash and change the IDs of the card to match the ROM.

You can verify that the ROM was correctly flashed by using:

# ./amdvbflash -v 0 ./249630.rom

If everything looks ok, shut down the system completely, cold boot and pray things work out. If it boots up in Linux now without any issues you should re-enable the VFIO-PCI module so it binds the card on the next boot. Don’t forget to rebuild initramfs (mkinitcpio for Arch).

Lastly, use your original VBIOS and pass that to the guest. That way the card will use the correct clocks for your card and make it run like it used to.

I would NOT recommend using the Sapphire RX 6700 XT on the guest, use your original VBIOS so it runs correctly.

Doing so is the standard fare
In Virsh:

<rom file="/Path/To/Your/Original_VBIOS.rom"/>

Enjoy your resetting 6700 XT.

Hopefully that was helpful for someone, but holy damn I did not expect a “solution” for this to actually pop up. Thanks to everyone involved, I’ll now go test this further.

6 Likes

Same, though. I just got this card and was disappointed to see that it had a reset bug and is indeed a vbios issue. It was a nice “holiday treat” to be able to get this card to work this way.

1 Like

I’m the author of that amdgpu reset issue that was referenced above. I think most likely scenario is that both AMD and Vendor are at fault. AMD for not supporting FLR and inventing instead this fragile hack which is very sensitive to pairing of specific chip and current loaded firmware and its state. And vendors which were shipping misbehaved vbios which was plaguing users for years with black screen issues regardless of OS or driver stack. It will be hilarious if the problem turn out to be that at some point in time vendors were soldering new chip revisions alongside old chip revisions. But specific vbioses are required for each revision to function properly but all batch had been shipped with the same vbios regardless.

As for using this vbios long term I’m daily driving this vbios (dubbed NAVI22XTLH) on my Sapphire Nitro+ for ~1,5 month already without any issues. And if anything after reflash card has become rock solid. No more random crashes, black screens and even rocm with pytorch is now working for hours without gpu freezing or producing random half precision errors. Though same as with stock vbios I’m undervolting it to 1125mV, 2424Mhz gpu clock and 950Mhz memory which is around 140W down from 210W. So it is much cooler, less power hungry and have only 5% performance loss. Otherwise on stock settings my unit had ~30C delta between temp and hotspot on some workloads under full load and I don’t want to have hotspot over 85C anyway.

5 Likes

THANK YOU SO MUCH!

This appears to have fixed my XFX 6700 XT. Your guide worked perfectly, the only issue i had was using a newer version of amdvbflash which complained with an “SSID mismatched” error, but using 4.71 worked.

Thank you so much, I was having a similar issue with my RX 6600 (Asus Dual), I could reboot the VM and it looked like it worked, but as soon as I opened a game the entire pc would hang for a few seconds and then reboot (X570 Gaming X motherboard, F39 BIOS). But reseting it that way works perfectly.