If you pass GPU passthrough in a single GPU configuration, the first passthrough after boot will succeed, but the second passthrough will fail if you reload the admgpu driver. In addition, it succeeds when the link speed of PCI-E is set to Gen2.
Is there a difference between PCI-E Gen2 and Gen3 for pci, amdgpu, and vfio-pci?
motherboard : BoiStar X570GT8
The same goes for the H370 motherboard.
CPU : Ryzen 3600
Memory : 32GB
GPU : Radeon RX580
OS : Ubuntu 19.10
Kernel : 5.4.21
NG Case1
-
Load amdgpu kernel module
modprobe admgpu -
Disable vtconsole
echo 0 > /sys/class/vtconsole/vtcon1/bind -
Unload amdgpu kernel module
modprobe -r amdgpu -
Load amdgpu kernel module
modprobe admgpuThere is the following message
amdgpu 0000: 0c: 00.0: GPU pci config reset
[drm] GPU posting now … -
Disable vtconsole
echo 0 > /sys/class/vtconsole/vtcon1/bind -
Unload amdgpu kernel module
modprobe -r amdgpu -
Start VM
virsh start Win10There is the following message
vfio-pci 0000:0c:00.1: vfio_bar_restore: reset recovery - restoring BARs
AMD-Vi: Completion-Wait loop timed out
iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0c:00.0 address=0x80b59f680]
NG Case2
-
Load amdgpu kernel module
modprobe admgpu -
Disable vtconsole
echo 0 > /sys/class/vtconsole/vtcon1/bind -
Unload amdgpu kernel module
modprobe -r amdgpu -
Start VM
virsh start Win10 -
Stop VM
-
Disable vtconsole
echo 0 > /sys/class/vtconsole/vtcon1/bind -
Unload amdgpu kernel module
modprobe -r amdgpu -
Start VM
virsh start Win10There is the following message
vfio-pci 0000:0c:00.1: vfio_bar_restore: reset recovery - restoring BARs
AMD-Vi: Completion-Wait loop timed out
iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0c:00.0 address=0x80b59f680]
There is no error message when starting the VM in step 8 without executing steps 6 and 7 after stopping the VM in step 5.
NG Case3
Change amdgpu to not reset ASIC.
kernel/v5.4.21/linux-5.4.21/drivers/gpu/drm/amd/amdgpu/vi.c
static int vi_asic_reset(struct amdgpu_device *adev)
{
int r;
amdgpu_atombios_scratch_regs_engine_hung(adev, true);
// r = vi_gpu_pci_config_reset(adev);
r = 0;
amdgpu_atombios_scratch_regs_engine_hung(adev, false);
return r;
}
VM startup of NG Case1 starts normally
-
Stop VM
-
Disable vtconsole
echo 0 > /sys/class/vtconsole/vtcon1/bindGPU pci config reset is suppressed, but there is a message below for some reason
[drm] GPU posting now… -
Unload amdgpu kernel module
modprobe -r amdgpu -
Start VM
virsh start Win10There is the following message
vfio-pci 0000:0c:00.1: vfio_bar_restore: reset recovery - restoring BARs
AMD-Vi: Completion-Wait loop timed out
iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0c:00.0 address=0x80b59f680]