does this module have to be modprobed? will it not function if built directly into the kernel at build time?
im one of those crazy people that likes to have everything i need in my kernel from the start. so i dont have to deal with dkms everytime i build myself a new kernel.
i have built this module into my kernel and attempted to reboot my Navi10 VM. it failed.
here is my dmesg immediately after shutting down the VM:
292.251722] AMD-Vi: Completion-Wait loop timed out
[ 292.396401] AMD-Vi: Completion-Wait loop timed out
[ 292.525060] AMD-Vi: Completion-Wait loop timed out
[ 292.653521] AMD-Vi: Completion-Wait loop timed out
[ 292.781893] AMD-Vi: Completion-Wait loop timed out
[ 292.910239] AMD-Vi: Completion-Wait loop timed out
[ 293.038503] AMD-Vi: Completion-Wait loop timed out
[ 293.108857] iommu ivhd1: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=45:00.0 address=0x103de03540]
[ 293.195431] vfio-pci 0000:45:00.1: can't change power state from D3cold to D0 (config space inaccessible)
[ 293.195917] ixgbevf 0000:0a:10.6: enabling device (0000 -> 0002)
[ 293.196099] ixgbe 0000:0a:00.0 enp10s0f0: VF Reset msg received from vf 3
[ 293.206989] ixgbevf 0000:0a:10.6: MAC address not assigned by administrator.
[ 293.206993] ixgbevf 0000:0a:10.6: Assigning random MAC address
[ 293.207933] ixgbevf 0000:0a:10.6: 3a:64:b2:5a:2d:0b
[ 293.207938] ixgbevf 0000:0a:10.6: MAC: 3
[ 293.207940] ixgbevf 0000:0a:10.6: Intel(R) X550 Virtual Function
[ 293.209885] ixgbevf 0000:0a:10.6 enp10s0f0v3: renamed from eth0
[ 294.110723] iommu ivhd1: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=45:00.0 address=0x103de035a0]
[ 295.112599] iommu ivhd1: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=45:00.0 address=0x103de035d0]
[ 296.114484] iommu ivhd1: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=45:00.0 address=0x103de03610]
[ 296.114497] iommu ivhd1: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=45:00.0 address=0x103de03630]
attempting to restart the VM yields the unknown PCI header 127 error. confirming a failed reset.
Debian 10 (Buster) running on a debianized Linux 5.8-10 with the vendor-reset module built-in.
AMD Threadripper 1950x
ASUS Prime X399-a with firmware 1002.
it should be noted that this does not hang all VMs on the host, which the BACO patch did. this module only seems to break the VM attempting a reset.
The withdrawn post is the installation message, it was correct. However I can see you loaded it way too late, the module must be loaded as early as possible. The defaut reset the kernel performs breaks the GPU completely, you must have vendor-reset loaded first.
i withdrew the post because i noticed a possible error on my end. vendor_reset loads AFTER vfio. i will report my results once i reconfigure to have vendor_reset load BEFORE vfio.
i have no means to get the dmesg of a panicked kernel, as i mever could get kdump to work.
the best i could do is take a picture of the screen with my cellphone. so here’s the tail end of the panic.
looking through my stuff. it seems i have used vendor_reset instead of vendor-reset in a few places. i will correct all instances of this, recompile my kernel and report back.