GPU pass-through error in single GPU configuration of RADEON RX580

If you pass GPU passthrough in a single GPU configuration, the first passthrough after boot will succeed, but the second passthrough will fail if you reload the admgpu driver. In addition, it succeeds when the link speed of PCI-E is set to Gen2.

Is there a difference between PCI-E Gen2 and Gen3 for pci, amdgpu, and vfio-pci?

motherboard : BoiStar X570GT8
The same goes for the H370 motherboard.
CPU : Ryzen 3600
Memory : 32GB
GPU : Radeon RX580
OS : Ubuntu 19.10
Kernel : 5.4.21

NG Case1

  1. Load amdgpu kernel module
    modprobe admgpu

  2. Disable vtconsole
    echo 0 > /sys/class/vtconsole/vtcon1/bind

  3. Unload amdgpu kernel module
    modprobe -r amdgpu

  4. Load amdgpu kernel module
    modprobe admgpu

    There is the following message
    amdgpu 0000: 0c: 00.0: GPU pci config reset
    [drm] GPU posting now …

  5. Disable vtconsole
    echo 0 > /sys/class/vtconsole/vtcon1/bind

  6. Unload amdgpu kernel module
    modprobe -r amdgpu

  7. Start VM
    virsh start Win10

    There is the following message
    vfio-pci 0000:0c:00.1: vfio_bar_restore: reset recovery - restoring BARs
    AMD-Vi: Completion-Wait loop timed out
    iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0c:00.0 address=0x80b59f680]


NG Case2

  1. Load amdgpu kernel module
    modprobe admgpu

  2. Disable vtconsole
    echo 0 > /sys/class/vtconsole/vtcon1/bind

  3. Unload amdgpu kernel module
    modprobe -r amdgpu

  4. Start VM
    virsh start Win10

  5. Stop VM

  6. Disable vtconsole
    echo 0 > /sys/class/vtconsole/vtcon1/bind

  7. Unload amdgpu kernel module
    modprobe -r amdgpu

  8. Start VM
    virsh start Win10

    There is the following message
    vfio-pci 0000:0c:00.1: vfio_bar_restore: reset recovery - restoring BARs
    AMD-Vi: Completion-Wait loop timed out
    iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0c:00.0 address=0x80b59f680]

There is no error message when starting the VM in step 8 without executing steps 6 and 7 after stopping the VM in step 5.

NG Case3

Change amdgpu to not reset ASIC.

kernel/v5.4.21/linux-5.4.21/drivers/gpu/drm/amd/amdgpu/vi.c
static int vi_asic_reset(struct amdgpu_device *adev)
{
int r;

    amdgpu_atombios_scratch_regs_engine_hung(adev, true);

    // r = vi_gpu_pci_config_reset(adev);
    r = 0;

    amdgpu_atombios_scratch_regs_engine_hung(adev, false);

    return r;

}

VM startup of NG Case1 starts normally

  1. Stop VM

  2. Disable vtconsole
    echo 0 > /sys/class/vtconsole/vtcon1/bind

    GPU pci config reset is suppressed, but there is a message below for some reason
    [drm] GPU posting now…

  3. Unload amdgpu kernel module
    modprobe -r amdgpu

  4. Start VM
    virsh start Win10

    There is the following message
    vfio-pci 0000:0c:00.1: vfio_bar_restore: reset recovery - restoring BARs
    AMD-Vi: Completion-Wait loop timed out
    iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=0c:00.0 address=0x80b59f680]


OK Case1

If you do not reload amdgpu, restarting the VM is fine.
However, input from the console will not be possible.

OK Case2

By fixing the PCI-E link speed to Gen1 and Gen2 in the UEFI BIOS (Gen3 and Auto are not allowed),
even if amdgpu is reloaded, restarting the VM is no problem.
However, if the link speed of PCI-E is set to Gen2, Gen2 will be used for other than the GPU.