RX6800 passthrough problems

I’m using latest proxmox 6.3-3. Just installed the AMD RX6800 and I wanted to do the passthrough.
The below apeared in my system:

➜  ~ lspci| grep "b0\|b1\|af"
af:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 1478 (rev c3)
b0:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 1479
b1:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 73bf (rev c3)
b1:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device ab28
b1:00.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 73a6
b1:00.3 Serial bus controller [0c80]: Advanced Micro Devices, Inc. [AMD/ATI] Device 73a4

➜  ~ lspci -n -s  b1:00
b1:00.0 0300: 1002:73bf (rev c3)
b1:00.1 0403: 1002:ab28
b1:00.2 0c03: 1002:73a6
b1:00.3 0c80: 1002:73a4
➜  ~ lspci -n -s  b0:00
b0:00.0 0604: 1002:1479
➜  ~ lspci -n -s  af:00
af:00.0 0604: 1002:1478 (rev c3)

I’ve added the lspci devices to vfio.conf ( I also have 2080TI passed thru to a separate VM which works fine, hence the other pci ids)

➜  ~ cat /etc/modprobe.d/vfio.conf
alias pci:v00001033d00000194sv000017AAsd000021D2bc0Csc03i30 vfio-pci
options vfio-pci ids=10de:1c81,10de:0fb9,10de:1e07,10de:10f7,10de:1ad6,10de:1ad7,1912:0014,1002:1478,1002:1479,1002:73bf,1002:ab28,1002:73a6,1002:73a4

My iommu groups for this:

➜  ~ find /sys/kernel/iommu_groups/ -type l|grep -i "b0\|b1\|af"
/sys/kernel/iommu_groups/144/devices/0000:b0:00.0
/sys/kernel/iommu_groups/147/devices/0000:b1:00.2
/sys/kernel/iommu_groups/145/devices/0000:b1:00.0
/sys/kernel/iommu_groups/143/devices/0000:af:00.0
/sys/kernel/iommu_groups/148/devices/0000:b1:00.3
/sys/kernel/iommu_groups/146/devices/0000:b1:00.1

➜  ~ find /sys/kernel/iommu_groups/ -type l|grep -i "144\|147\|145\|143\|148\|146"
/sys/kernel/iommu_groups/144/devices/0000:b0:00.0
/sys/kernel/iommu_groups/147/devices/0000:b1:00.2
/sys/kernel/iommu_groups/145/devices/0000:b1:00.0
/sys/kernel/iommu_groups/143/devices/0000:af:00.0
/sys/kernel/iommu_groups/148/devices/0000:b1:00.3
/sys/kernel/iommu_groups/146/devices/0000:b1:00.1

I have created a new VM with below config ( I also tried to pass entire b1:00 and this is a second try by passing through all devices separately):

bios: ovmf
machine: q35
hostpci0: b1:00.0,pcie=1,x-vga=1
hostpci1: b1:00.1,pcie=1
hostpci2: b1:00.2,pcie=1
hostpci3: b1:00.3,pcie=1

Now, whatever I could try I cannot get the proper resolution to work. I tried fedora workstation 33, ubuntu 20.04, 20.10 and all of them are sort of in nomodeset at very low resolution. I’ve tried installing latest mesa from ppa and no results, I also tried installing AMD drivers from AMD site but it fails on DKMS and not even install the driver.

So I have two questions. First, why am I not able to pass through the devices b0:00 and af:00 - they are not visible in a PCI picker in proxmox web ui as well as I’ve tried to edit the config but I got the :

kvm: -device vfio-pci,host=0000:b0:00.0,id=hostpci4,bus=ich9-pcie-port-5,addr=0x0: vfio 0000:b0:00.0: failed to open /dev/vfio/144: No such file or directory

Second question, if these devices dont need to be passed thru, why am I not getting the proper display resolution out of the box.

What flavor of card is this? And is this with the acs patch? Csm disabled in bios?

It’s a PowerColor AMD Radeon RX 6800 MBA 16GB GDDR6

Motherboard: WS-C621E-SAGE

I had CSM enabled however disabled it right now and no difference other than the onboard video stopped working properly

see the strange thing here is that every device on the card is in its own iommu group, and that probably shouldn’t be the case normally.

any other kernel messages (dmesg output)

wondering if you need to ignore msrs or something like that.

I have ignore msrs in a settings, I dont see anything special in dmesg as well. I have now removed the nvidia from the system entirely and will see if that changes anything.

So there is nothing obvious in host dmesg and VM dmesg just says it enters the modeseting which explains the low graphics resolution.

Ok so amdgpu driver installs fine when I give it --no-dkms however it doesnt load the driver.

When I run the lshw -c video it shows unclaimed, when I modprobe amdgpu and replug the displayport it tries to work but it fails to display at all so something tells me it needs those 2 additional devices to be passed through however I have no option to do that :expressionless:

Surprisingly … Windows 10 just installed the driver fine and got full resolution after installing the amd driver …

1 Like

Ok installed 5.10 kernel on 20.04 and it actually managed to knock out and reboot the host … no giving up yet … :slight_smile: