Vega 10 and 12 reset application

Honestly there will be very little effort in backporting it, it’s just a new pci quirk in drivers/pci/quirk.c

2 Likes

Do I understand correctly that you will provide us with the patch, probably in the coming days for self compilation, after wendell has tested it? As well as it is going to be included upstream in one of the next kernel versions?
I am just asking because I pretty blindly bought a 5700 XT without knowing about the reset bug on monday and can believe my luck that you really seem to have created a solution so soon. Thanks a lot for that.

Welcome to the forums @anon85976236 :slight_smile:
Yes you are correct, after a year of working on this issue, on and off (vega also) and pleading with AMD for help this is finally seeing progress. I have a working Navi 10 (5700 series) reset patch for the kernel, once I am happy with it and it’s verified to work it will be sumitted upstream for review. At that point everyone will be able to get access to and apply it.

3 Likes

sigh… the patch just got more complicated, seems the register bases can move depending on the vbios and there is a discovery table i need to read out and parse for their locations…

Edit: After further discussions with an Engineer at AMD, it seems that they are making efforts to implement a working FLR reset into the firmware which will render this patch obsolete.

Once @wendell confirms the patch is working for him I will release it here. If we don’t see a FLR fix from AMD I will also upstream the patch. This way we alteast have a solution in the interim.

5 Likes

Would the AMD firmware update work for all cards or just navi onwards?

Edit: After further discussions with an Engineer at AMD, it seems that they are making efforts to implement a working FLR reset into the firmware which will render this patch obsolete.

I think that will slow(2020?) a bit the release of a reset fix. AMD will release a new updated driver with a new firmware on Windows/Linux. MAC OS driver must be updated too by Applle(slow)!! A reset fix on host Linux(PCI quirks) appears to be a better solution(general purpose, work on any VM OS). Am I missing the point?

Well it does not matter too much for windows right now, and all the drivers a developed independently for each OS. Mac similarly does not really matter right now as both it and win are not commonly used as VM hosts for home gamer setups.

AMD doing it officially will mean support for the future rather than relying on gnif and others.

Quirks will work but it is better to have this as an expected feature of an AMD product rather than community hack.

Yes, AMD are actually improving this for VFIO usage, they now see it as important. A FLR reset requires no additional vendor specific code in the kernel. This would imply that in the future we wont see this issue again.

1 Like

There is perhaps another explanation – I would hazard a guess that FLR is technically required for WHQL – “Display Driver Stopped Responding and Had To Be Reset”

Is this not a FLR? Wouldn’t that be funny if that’s why this has to be implemented, to be fully compatible with windows’ ability to reset bad/crashy drivers?

Experimenting with FSB overclocks nvcards do seem to recover better… hmmm emjoji…lol

1 Like

I really doubt it, device recovery is a requirement, not FLR. The device can be recovered from a driver, we already know that, but it can’t be re-posted by a BIOS. So the device supports a reset & recovery, just not a via a standard method.

1 Like

Yes, AMD are actually improving this for VFIO usage, they now see it as important.

Great news. AMD Adrenaline driver on Windows will work on VFIO!
Right now it don’t work( Version 19.1.1, Error 43)

Works fine for me on the 5700 XT, might be the reset issue you’re encountering.

On Ubuntu Linux 19.04 and MAC OS Mojave works fine! (using my hardware reset hacking on RX 580)

Yeah, probably doesn’t specify flr, just that windows can reset the device if it stops responding… Which I was seeing on a pure windows system with a bad bus overclock you get a permanent black nothing whereas on nv it does seem to recover pretty consistently.

Yes, AMD are actually improving this for VFIO usage, they now see it as important. A FLR reset requires no additional vendor specific code in the kernel. This would imply that in the future we wont see this issue again.

Have your plans changed(AMD firmware)? Are you planning to release the VEGA 64 reset fix(quirks)? NAVI don’t work on MAC OS!

does this kernel patch also work for Polaris? or is it only gonna be Vega and newer?

Yes

Only if AMD doesn’t come through. In the interim, I will release my latest userspace tool open source for people that need the reset now.

Fully aware of this

Navi and Vega… :man_facepalming:

2 Likes

Only if AMD doesn’t come through. In the interim, I will release my latest userspace tool open source for people that need the reset now.

Source code ?? Yes source code rssrssrsr :rofl:

Pushy much?

@gnif: You mentioned that the registers can shift around based on the bios, but later you mentioned you are going to release your tool still. Will the tool work on all Navi cards or is there like a practice that vendors like MSI, Sapphire, Powercolor and so on modify the vbios and the tool might not work on some of them or something like that? Or is that even what is going to be tested currently?

Also you said userspace tool, but I will still need a kernel patch, won’t I?

1 Like