TRX40 Related - OVMF/QEMU Failing to boot - Pauses / Black Screen when starting VM

@maximlevitsky ‘pcie_aspm.policy=performance’ does not work. VM pauses right after I start it.

That sucks.
Could you then boot with ‘pci=noaer’ only and see if hiding AER errors makes it work?
(I assume that you don’t use currently pci=noaer)

Could you look at this? I can’t test it and it might be a bummer for me if that doesn’t work

Sorry for the slow reply.

I’ll check it out tomorrow for you and report back.

@maximlevitsky

1 Like

@vljio Thanks a lot!!

@maximlevitsky pci=noaeor does not have any effect unfortunately.

You typed this correctly? Thank you and I guess it will be fun making it work :frowning:
Thanks again!

I got it working.
I compiled my own kernel and disabled the DPC error driver, and volla, both my RTX2070S work and I don’t need to disable aspm so my thunderbolt driver works too.

By work I mean that USB controller passes through cleanly and so does all other nvidia devices. Now my only wish is that nvidia keeps that little USB port on RTX 3000 series I am waiting for (I will sell that card when RTX 3000 series is released)

Most likely the same can be done with blacklisting and/or poking at sysfs to disable that driver.

These DPC errors probably are just bogus, and are leftover from some enterprise EPYC features.

Hi—coming here from TRX40 Nvidia Single GPU Passthrough where I seemed to be hitting a similar issue (although weirdly mine worked completely fine for the first host boot but then failed after the guest was shut down the first time). Did you ever resolve this with blacklisting or anything other than a custom kernel?

Hi @allu what ultimately solved it for me was not passing the USB controller on my GTX 2080 through OR having the PCI-E ASPM setting off.

Use kernel version 5.4, that’s what I’m using today.

Here’s my vfio.conf (/etc/modprobe.d/vfio.conf):
options vfio-pci ids=10de:1e87,10de:10f8,10de:1ad8,10de:1ad9,1912:0015,8086:1528 options vfio_iommu_type1 allow_unsafe_interrupts=1

Here’s the “GRUB_CMD_LINUX_DEFAULT” pieces that matters for you (/etc/default/grub):
amd_iommu=on iommu=pt video=efifb:off pcie_aspm=off pci-stub.ids=144d:a808,144d:a801

Make sure to run `sudo mkinitcpio -p linux54’ (or linuxXX whatever version you’re using) after you edit your grub config then reboot.

Hello

How have you dealt with the DPC Error with the device 20.03.1

When I try to passthrough a GPU I always get this error

Thanks

I was facing this same issue on a Z390 system while passing through a Renasas (Startech) card. Adding pcie_aspm=off to grub fixed it. Thanks!