We have a similar report in discord for out of tree, I am investigating.
I have made some changes in git that may help but really I canāt see why they would. Itās getting stuck searching the device db, which is a simple array search. Can you please update and try again?
doesnāt seem to have made a difference.
hereās the image of the panic after the new update though.
as much as I hate to ask this as I have doubts as to if it will help, but can you please try to do a make clean
before you build just incase we have an old .o
hanging around? The code here is extremely simple and I canāt explain what could cause it to get stuck here.
i have been cleaning my builds before each attempt. thats just a habit i have.
Can you please try with tail call optimisation disabled? -fno-optimize-sibling-calls
. If this works we have a solution
Edit: just pushed this change in anyway as we are pretty sure itās the solution. Simply update and rebuild to see
A gcc bug? Iāve tripped over that one too elsewhere!
i can confirm, this new update allows the kernel to bootup with the module added in /etc/modules.
now i can finally test the actual reset behavior.
it seems to reset my Navi10 just fine.
however, somehow, virsh autostart no longer works. the VM boots, but no output from the GPU on the guest. the only way i get GPU output form my guest is by virsh destroy and then a virsh start
rather annoying, as im so used to my passthrough working fine from bootup.
here are some results from my testing.
virsh destroy and then virsh start: works perfectly.
shutdown called from guest OS: reset fails.
virsh shutdown and then virsh start: reset fails.
reboot called from guest OS: reset succeeds, but no GPU output.
virsh reboot: reset succeeds, but no GPU output.
these tests were done with a guest running Debian 11 Bullseye on a debianized Linux 5.7.6 with Valveās futex-wait-multiple patch applied.
No, not a āgcc bugā, but an assumption about how the hooks work. TCO replaces a call with a jump when it can to avoid stack usage for return addresses, normally this is fine, but since we are patching a kernel symbol we need the return instruction pointer kept in tact.
@mathew2214 can you please provide the dmesg for this?
here is the log from all of these tests. my apologies, i diddnt think to record the timestamps of when each test was performed. so it may be impossible to tell what test caused what dmesg output here.
yes im aware my ECC setup is broken.
VM autostart seems to begin at line 2014.
tessa0dmesg.txt (223.1 KB)
What is your guest OS?
Debian 11 Bullseye with debianized Linux 5.7.6 with Valveās futex-wait-multiple patch applied.
Thanks, we have not much tested Linux as a guest with these resets, we do know that between Windows, OSX, and Linux the GPU state after shutdown is lackingā¦ consistencyā¦
I donāt see anything obvious but these sequences are going to see improvements over the coming weeks as we tweak and tune them, so stay tuned!
I would love to give this project a try.
Can you tell whether this would work on a stock openSuSE kernel?
When I try to install it using dkms, the output I get is as follows:
sudo dkms install .
Creating symlink /var/lib/dkms/vendor-reset/0.0.17/source ->
/usr/src/vendor-reset-0.0.17DKMS: add completed.
Error! echo
Your kernel headers for kernel 5.8.14-1-default cannot be found at
/lib/modules/5.8.14-1-default/build or /lib/modules/5.8.14-1-default/source.
install your kernel headers. should be a package. consult your distributions documentation.
Thank you, Iāll look into it
Just tried this on Fedora 33 with latest un-patched kernel 5.9.8-200.fc33.x86_64 for a Navi 10 RX5700XT, and can report that it works great.
Thank you!
Just in case anyone is looking for the extra steps required to ensure early load on Fedora, I used:
# echo "vendor-reset" > /etc/modules-load.d/vendor-reset.conf # dracut --add-drivers "vfio vfio-pci vfio_iommu_type1 vendor-reset" --force # lsinitrd | grep -e vfio -e vendor
Last command should show .ko.xz files for both vfio and vendor-reset.
While the first version of belfrypossums version did not work for me, this finished module works perfectly.
Thanks a lot for this!
Edit: Manjaro with 5.9 Kernel and Powercolor 5700XT
Very nice
So with this project, there is no need to patch the distros kernel anymore?
I will give it a try on my GNU/Linux Ubuntu 20.04.1