This has been superseded by the new vendor-reset project which doesn’t require any kernel patching. This application and patch along with all other navi/vega reset patches on this website are now obsolete and should not be used.
Hi All,
As some of you may be aware I have been working to find either a workaround or fix to the AMD Vega reset bug. Last week I posted to AMD’s reddit a cry for help to fix this issue in an attempt to show AMD how much demand there is for this. As a result, an AMD Engineer got in touch and has guided me to a possible solution to the problem.
Over the weekend I have spent considerable time implementing what seems to be a working reset for Vega 10 and 12, initial testing by a few people confirm that it is working on Vega 10, however it needs further testing.
You must apply this patch to your kernel to prevent vfio-pci from attempting to reset the GPU incorrectly.
Please note that this application is intended as a interim workaround while I work on implementing this into the kernel for vfio.
Usage is simple, obviously you must not be using the GPU at the time and it should be bound to vfio-pci.
./reset-test 0000:24:00.0
The expected output is:
============================================================================
AMD Vega 10/12 Reset Application (Version: 1.0)
Copyright (c) 2019 Geoffrey McRae <[email protected]>
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
This tool is intended as an interim workaround while I port this into the
kernel driver. If you like my work and want to support it you can contribute
using the following methods:
* Ko-Fi - https://ko-fi.com/lookingglass
* Patreon - https://www.patreon.com/gnif
* BTC - 14ZFcYjsKPiVreHqcaekvHGL846u3ZuT13
============================================================================
Attempting Vega 10 reset
CMD_READMODIFYWRITE 0x00000e1c
CMD_WRITE 0x00000e1f
CMD_READMODIFYWRITE 0x00000e2b
CMD_READMODIFYWRITE 0x00000e2b
CMD_WAITFOR 0x0001667c
CMD_READMODIFYWRITE 0x00000e2b
CMD_READMODIFYWRITE 0x00000e2b
CMD_READMODIFYWRITE 0x00000e2b
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x00000e2b
CMD_DELAY_MS
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_WAITFOR 0x00000e2b
CMD_READMODIFYWRITE 0x00000e2b
CMD_DELAY_MS
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_WAITFOR 0x0001667c
CMD_READMODIFYWRITE 0x0001667c
CMD_READMODIFYWRITE 0x00000e2b
CMD_READMODIFYWRITE 0x00000e2b
CMD_READMODIFYWRITE 0x00000e2b
CMD_WAITFOR 0x00000e2b
CMD_WRITE 0x00000052
CMD_WRITE 0x00000053
At this point the GPU should successfully post inside a VM, even after a dirty shutdown or VM crash.
A reset for Vega 20 and Navi is possible, but as I do not have these devices to develop against I can not safely implement it. Poking blindly at the wrong registers is dangerous and can destroy the GPU.
If you would like to see Navi also supported you can contribute to the cost to purchase a suitable card below:
Edit: Funding is complete! Thank you everyone for your support!