Now it gets interesting.
I don’t remember the fancy command to display all the system info in console. So here it is:
OS Manjaro (all updates, current)
AMD Ryzen 5950x
Gigabyte Aorus X570 Master
Host GPU (third (usable, last) slot, PCIe x4) AMD W7100
GPU (first x16 slot (in x8 mode), for Windows (works fine, but has no USB Port, so passing through the only possible from the host)): Zotac RTX 3090
GPU (second x16 slot (in x8 mode), for Linux): NVIDIA RTX 2080 ti FE
and a bunch of other stuff not relevant to the matter … storage etc.
Assumption: passing through not only the GPU and audio from the 2080 ti will give me another usable USB host in the GPUs IOMMU group to connect to KVM.
Problem:
The 2080 ti FE will not passthrough it’s USB port, the VM fails to boot and crashes:
2021-01-28T16:50:00.510087Z qemu-system-x86_64: warning: This feature depends on other features that were not requested:
CPUID.8000000AH:EDX.npt [bit 0]
2021-01-28T16:50:00.510090Z qemu-system-x86_64: warning: This feature depends on other features that were not requested:
CPUID.8000000AH:EDX.nrip-save [bit 3]
2021-01-28T16:50:07.505912Z qemu-system-x86_64: vfio_err_notifier_handler(0000:0d:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest
I’ve gone the extra mile with my Gigabyte RTX2080 ti OC gaming card (installed in the first slot, as it is a 2.something variant and wouldn’t fit between the W7100 and the 3090. Basically the same result, but with different log messages:
2021-01-28T17:05:15.335122Z qemu-system-x86_64: warning: This family of AMD CPU doesn't support hyperthreading(2)
Please configure -smp options properly or try enabling topoext feature.
2021-01-28T17:05:21.868583Z qemu-system-x86_64: vfio_err_notifier_handler(0000:0d:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest
So I tried to change the CPU in KVM to KVM64 (or something along those lines), still failed, but with a different message:
2021-01-28T17:05:55.653556Z qemu-system-x86_64: vfio_err_notifier_handler(0000:0d:00.0) Unrecoverable error detected. Please collect any data possible and then kill the guest
Then I did an extended extra mile and testet the FE and the Gigabyte 2080 ti in an Intel system (i7-8700 (non-K) and Gigabyte Z390 Gaming X mainboard):
This system has the latest Proxmox installed and it gave me display output and USB passthrough on both cards just fine.
I’m still lost, any suggestions? Is there a way (some kernel parameter or whatever) to get the USB port on Turing to work? How and why are FE and AIB cards different? They us the same TU102, after all …
I know, possibly Proxmox would behave differently on the AMD system, but I am a bit reluctant to try because Proxmox might have to be updated first. I could, however, install Manjaro on the Intel machine … any thoughts?
What debugging is useful/necessary?
(currently searching for an AMD reference model RX 6800, it has a USB-C port and it is a 2-slot design, so anyone who wants to sell, please drop a message)