Navi Reset Kernel Patch

@gnif I wanted to ask you regarding your patch. Do you object in packaging this patch with a kernel and making it available for ease of use and convenience? A friend and me had great use for this patch but it took us long to figure out. We wanted to make the patch available for our distro (Arch Linux, in the ‘aur.archlinx.org’).

1 Like

No issues at all mate, just make sure people know it’s not 100% complete.

hey, ive been having issues compiling kernels with this patch.
if i dont apply the patch it works just fine.
if i apply the patch i get some errors while compiling.

drivers/pci/quirks.c: In function ‘reset_amd_navi10’:
drivers/pci/quirks.c:3995:9: error: implicit declaration of function ‘ioremap_nocache’; did you mean ‘ioremap_cache’? [-Werror=implicit-function-declaration]
3995 | mmio = ioremap_nocache(mmio_base, mmio_size);
| ^~~~~~~~~~~~~~~
| ioremap_cache
drivers/pci/quirks.c:3995:7: warning: assignment to ‘uint32_t *’ {aka ‘unsigned int *’} from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
3995 | mmio = ioremap_nocache(mmio_base, mmio_size);
| ^

@The_Poot
Linux has removed ioremap_nocache and replaced it with ioremap. You can try to change the function from “ioremap_nocache” to “ioremap”. The linux kernel mailing list says it is now basically the same if I recall correctly.

1 Like

thanks for the fast reply.
i will try this and get back to you

this error has went away. thank you

1 Like

I can’t seem to apply this patch, it keeps failing on the second hunk, I did change ioremap_cache to ioremap by the way… I’ve been trying it on both the Arch and the linux-amd-staging-drm-next-git from AUR on Arch, the build system attempts to apply it but it fails.

Can you give us some more insight about what exactly happens? An error message would be a good start to debug your error. If you seek for an easy arch linux solution, you might want to take a look at “linux-fix_navi_reset” in the AUR.

1 Like

After a BIOS update, I get a different error in dmesg:

[ 1064.228438] vfio-pci 0000:0d:00.3: enabling device (0000 -> 0002)
[ 1064.265533] vfio-pci 0000:0b:00.0: Navi10: performing BACO reset
[ 1064.267611] vfio-pci 0000:0b:00.0: Navi10: SMU error 0xff (line 4149)
[ 1065.143352] virbr0: port 2(vnet0) entered forwarding state
[ 1065.143354] virbr0: topology change detected, propagating
[ 1065.269142] vfio-pci 0000:0b:00.0: Navi10: sol register = 0x124e293
[ 1065.269406] vfio-pci 0000:0b:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
[ 1065.269418] vfio-pci 0000:0b:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
[ 1065.269422] vfio-pci 0000:0b:00.0: vfio_ecap_init: hiding ecap 0x25@0x400
[ 1065.269424] vfio-pci 0000:0b:00.0: vfio_ecap_init: hiding ecap 0x26@0x410
[ 1065.269425] vfio-pci 0000:0b:00.0: vfio_ecap_init: hiding ecap 0x27@0x440
[ 1065.342457] vfio-pci 0000:0b:00.0: Navi10: performing BACO reset
[ 1065.344536] vfio-pci 0000:0b:00.0: Navi10: SMU error 0xff (line 4149)
[ 1066.346065] vfio-pci 0000:0b:00.0: Navi10: sol register = 0x124e340
[ 1082.716731] vfio-pci 0000:0b:00.0: Navi10: performing BACO reset
[ 1082.718810] vfio-pci 0000:0b:00.0: Navi10: SMU error 0xff (line 4149)
[ 1083.720340] vfio-pci 0000:0b:00.0: Navi10: sol register = 0x124eedb

Also, I get a BSOD instead of no output.

Sorry, I think I’d edited the patch wrong from the forum. It’s worked for me now.

I doubt there are pages that describe how I can compile a kernel with that patch included for openSUSE, is it?

I made a thread related to that issue which I wonder about for days now actually :eyes:

It’d be interesting to test whether the RX5700 now resets properly.

There’s at least one OBS repository for Tumbleweed which seems to have the patch applied, along with the ACS patch. No idea if it works though, but the link diff is there and it is tied to the main development branch. At the very least it could be used as a basis for your own build.

1 Like

That’s pretty cool but the repo is sadly outdated. The last release was over a year ago…

Hi everyone, small update: Manjaro Kernel 5.6, 5.7 and the experimental version of 5.8 should include the first version of the Navi reset patch, now thanks to Manjaros philm!

Hi there, I can’t really help you with compiling kernels on openSUSE, you will have to look that up yourself. Just as a tip, normally the workflow includes some kind of step where patches can be applied. But even if this would not be the case on openSUSE the moment you have the sources for the kernel you can just apply the Navi patch manually. Either you can directly run the apply command on the folder or even if this again would not be possible, these patch files basically just replaces lines in textfiles. The lines with a leading + are what is being added and lines with a leading - is what is being removed. You can open the “drivers” folder in the sources, then the “pci” folder and then “quirks.c” file and edit the lines there manually before you start compiling.

BTW great username, I approve!

Hey, I apologize for taking up space on this board, but I wanted to ask if anyone else had this issue. I am running a 5600xt with this patch with Proxmox, and my Windows 10 VM is having an odd issue. It can recognize the GPU every boot before I install the driver no problem, but once the driver is installed, it only works every other boot. On the boots where it doesn’t work, the GPU spits an error 43 in device manager. I can disable and re-enable it and it will work.

I just wondered if anyone else had encountered this or had a fix. It seems like a driver issue, but I’m not really sure. Using Windows 10 2004 btw.

The patch is not 100%, it’s why it has not been upstreamed. We need more help from AMD to make it reliable.

I’m not even 100% sure its a problem with the patch…The driver installer can see the GPU consistently across boots, but once the driver is installed it begins spitting issues. I’m genuinely ignorant on the underlying processes with VFIO though lol

I am… the GPU has to be reset to a pre-boot state, and if it isn’t reset completely the windows driver may fail to load. By disabling and re-enabling the driver it cleans up something we are missing and allows it to load.

Ahhhh I see! Okay, makes total sense! Sorry for being dumb lol, thanks for the info!