Radeon 7900XT, reset bug

Hello
pls can you clarify if reset bug is present for 7900XT ?
Im asking because i read some reddit post from someone who said that XFX 7900XT was not suffering any reset bug, when bios file is passed (in virt-manager/libvirt - .xml)
and i read another stuff saying that reset bug is back and present again in 7900XT

Pls, owners of radeon 7900XT? can you comment to this topic?

Great thank you for any input/clarification

BR, raven4

2 Likes

I do not own an 7900XT myself, but I do own several other Navi family cards. Some do and some don’t have the reset bug. Those that do, usually can have the bug mitigated by the vendor-reset module.

vendor-reset does not support the 7000 GPU series.
Reports in the VFIO discord have been mostly negitive with regards to this generation working for VFIO.

3 Likes

With AMD the essence is that you either get a card that resets or you get one that does not. There is not way to be sure since the design is proprietary and AMD still has not managed to fix the reset bug on their end.

To put it simply, if you want to be sure your card resets you unfortunately have to buy a card from NVIDIA ( or Intel I guess, but I have not heard about the ARC series cards and if they reset properly).

Hello
I’m still wondering why some ppl say that their 7900X? does not suffer reset bug if they pass vbios file, I believe 7900XT from XFX vendor was mentioned as working… SO why ?
Is it design flaw at AMD side OR card vendor ? Can card vendor like XFX mitigate this flaw?

@gnif Gnif, you proved yourself as skilled hero in this fight…so are you working / planing some sw workaround also for 7900 series ?

Thank you

BR, raven4

1 Like

Not unless there is some financial incentive, and even then I am not too interested. The AMDGPU code base is a nightmare and while it’s “open source”, it’s not really.

There is no documentation on any of it, it’s full of acronyms that are internal to AMD, and it makes use of binary blobs like the ATOMBIOS.

I spent more then my fair share debugging AMDs devices and helping them to understand the issue only to see them just dismiss it all in future generations and not even get a “thanks” or any other form of support when the result of my work resulted in sales for them.

If AMD were to approach me as an engineer and ask for assistance, I would be happy to help them provided it was a professional paid arragement and I was granted access to their internal documentation (NDA obviously). AND I didn’t have to crowd fund the GPUs like last time.

I was rather shocked when I found out that AMD’s own staff have to buy GPUs to work on. AMD do not provide them even for their employees.

So in short, I have moved on.

6 Likes

@gnif I fully understand your point. I was also programmer for 30yrs (but just “hobby-time”) (from Basic language in 90’s to Java for android). But as kids came, free time vanished and I moved on also. Anyway thank you for your past work, you helped alot many ppl with previous AMD gen.s.

Anyway, you maybe still have better overview than anyone else… is it possible that 7900 from XFX trully does not suffer reset bug OR the card owner just didnt test it detail ?
Do you have any signs that some other vendor 7900 can work without reset bug?

Because if we truly avoid 7900X?, then at ~same price level 4070Ti only remains (but with much lower VRAM :frowning: )

Thank you

BR, raven4

2 Likes

Reports in the VFIO discord are mixed, it’s hit and miss. As for why/how/what, I have no idea sorry, it’s an entirely new architecture.

2 Likes

I’ve been trying to nail down the 7900 problem with reset. Grumble.

It’s maybe going to be possible to split a single arc a770/16gb intob2x8gb. We will see… Soon™

7 Likes

Hello Wendell
thank you for your reply.

@wendell have you personally seen/tested any 7900X?? which was NOT suffering from reset bug ? Do you think it is possible that some 7900’s from particular vendors like XFX dont suffer from reset bug?

BTW: I dont think that Arc is good solution in any way. Performance is is -1 or -2 generations, drivers are thought to be buggy and problematic + nobody knows if Intel GPU will be developed in the future… = who knows if Intel will continue develop/update/maintain drivers in the future for future games and also Intel GPUs are more energy hungry :frowning:

… so fu**ing greedy Nvidia wins as always :frowning:

BR,
raven4

@raven4 it doesn’t matter what vendor, it has nothing to do with it. It’s a silicon issue in the GPU chip itself.

2 Likes

Hello
Im a bit confused

@gnif u mentioned that reset bug is caused by AMD chip itself, so vendor of card itself does not matter, that makes sense.
Yet, there are ppl who say that their card does not suffer from reset bug, also u literaly mentioned about “Reports in the VFIO discord are mixed” … so how would this mixed reports be possible if all card vendors use same chips

Thank you for you answers

BR, raven4

2 Likes

It’s the combination of hardware, chipset, even the CPU and BIOS version combined. We are not sure exactly why it works for some and not others, but I can guarantee you that it has absolutely nothing to do with the vendor.

GPU vendors are simply building the card interface to the AMD SoC (system on chip) itself, they still use AMDs closed source binary blobs in the GPU BIOS, and the chip itself MUST be physically wired to the PCIe bus in EXACTLY the same way.

The only differences between GPU vendor designs are selected GPU clocks, their selection of component brands, and how they design their cooling and power delivery to the chip. They have absolutely zero control on the silicon itself or how it works internally.

Yes, so the trick is to figure out what external factor has changed that makes it work/fail.

Without access to multiple motherboards, CPUs, and GPUs being able to methodically test to find out, pointing at the GPU vendor is just a guess… and a very bad guess at that.

There is also the factor of software versions and configurations.
what GPU bios version is it using? was it older stock that has an older BIOS in the GPU?
Is there some bug/fix in a specific distro of linux where they have modified the kernel that is having a desirable impact?
Is it on an AMD or Intel CPU?

The only way I can see this being figured out is if someone was to invest the time into building a standard live CD image that was specifically made to boot a VM with VFIO pass-through autonomously and then report back to a central server somewhere the success/fail status and exact hardware configuration of the system.

This would allow one to build a matix of working vs non-working configurations and narrow down exactly what is the cause.

The time investment in this though is huge and I have little incentive (or interest) in doing this.

2 Likes

Does the workaround with dumping and attaching the bios described here:

not work on the XT ?

with kind regards

Who knows? There are various modes of failure, this works for some people, and not for others.

1 Like

I’m just trying this now (also on X399) with an Asrock 7900XT and it looks like it works very similarly to how it worked with my Vega 64. Using the directions on that thread helped, but there may be a bit more to document. I’m trying to get all the way to having Bar support as mentioned on that thread.

1 Like

if you get any new infos or documentation feel free to post it into wendels ot the xtx thread (:
i sadly had to switch my xtx nitro + insice the return window for a novideo 4080 :frowning:
(every blackscreen and update problem i encounted since than was related to nvidia f ing things up)
belive me its as trash as everyone claims… be happy to be on amd