After Motherboard exchange no Video output from Guest GPU

my other slot is occupied by my RTX 2080. If I swapped the two, would it still work or would I have to change something in my vfio config?

1 Like

vfio config no, you would just need to change your libvirt XML to point to other PCI number.

Below is my vfio config modified to your output.

the first 1022:1452,1022:1453 look like they belong to your motherboard as you are using AMD CPU so adding them vfio might potentially stop your system from booting.
I would try below and see if that’ll fix anything.
Also if you feel brave: options vfio-pci ids=1022:1452,1022:1453,1002:1478,1002:1479,1002:731f,1002:ab38

IOMMU Group 2 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 2 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 2 26:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c1)
IOMMU Group 2 27:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
IOMMU Group 2 28:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c1)
IOMMU Group 2 28:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]



➜  ~ cat /etc/modprobe.d/vfio.conf         
options vfio-pci ids=1022:1453,1002:1478,1002:1479,1002:731f,1002:ab38
softdep radeon pre: vfio-pci
softdep amdgpu pre: vfio-pci
options kvm_amd avic=1

Also I am assuming you have have blacklisted the amdgpu ?

➜  ~ cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
➜  ~ cat /etc/modprobe.d/blacklist.conf|grep -i amd
blacklist amdgpu
1 Like

okay, i tried it but i think i did something wrong:

unexpectedly closed the monitor: 2020-12-27T19:59:02.074386Z qemu-system-x86_64: warning: This feature depends on other features that were not requested: CPUID.8000000AH:EDX.npt [bit 0]
2020-12-27T19:59:02.074392Z qemu-system-x86_64: warning: This feature depends on other features that were not requested: CPUID.8000000AH:EDX.nrip-save [bit 3]
2020-12-27T19:59:02.074827Z qemu-system-x86_64: warning: This feature depends on other features that were not requested: CPUID.8000000AH:EDX.npt [bit 0]
2020-12-27T19:59:02.074832Z qemu-system-x86_64: warning: This feature depends on other features that were not requested: CPUID.8000000AH:EDX.nrip-save [bit 3]
2020-12-27T19:59:02.134392Z qemu-system-x86_64: -device vfio-pci,host=0000:27:00.0,id=hostdev0,bus=pci.4,addr=0x0: vfio 0000:27:00.0: group 0 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 101, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 57, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/object/domain.py", line 1330, in startup
    self._backend.create()
  File "/usr/lib/python3.8/site-packages/libvirt.py", line 1234, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirt.libvirtError: Interner Fehler: qemu unexpectedly closed the monitor: 2020-12-27T19:59:02.074386Z qemu-system-x86_64: warning: This feature depends on other features that were not requested: CPUID.8000000AH:EDX.npt [bit 0]
2020-12-27T19:59:02.074392Z qemu-system-x86_64: warning: This feature depends on other features that were not requested: CPUID.8000000AH:EDX.nrip-save [bit 3]
2020-12-27T19:59:02.074827Z qemu-system-x86_64: warning: This feature depends on other features that were not requested: CPUID.8000000AH:EDX.npt [bit 0]
2020-12-27T19:59:02.074832Z qemu-system-x86_64: warning: This feature depends on other features that were not requested: CPUID.8000000AH:EDX.nrip-save [bit 3]
2020-12-27T19:59:02.134392Z qemu-system-x86_64: -device vfio-pci,host=0000:27:00.0,id=hostdev0,bus=pci.4,addr=0x0: vfio 0000:27:00.0: group 0 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.

i switched port from the rx 5700xt and the rtx 2080 and changed my config:

# cat /etc/modprobe.d/vfio.conf

options vfio-pci ids=1002:1478,1002:1479,1002:731f,1002:ab38
softdep radeon pre: vfio-pci
softdep amdgpu pre: vfio-pci
options kvm_amd avic=1

`# cat /etc/mkinitcpio.conf -> arch use this for the modules`

# vim:set ft=sh
# MODULES
# The following modules are loaded before any boot hooks are
# run.  Advanced users may wish to specify all system modules
# in this array.  For instance:
#     MODULES=(piix ide_disk reiserfs)
#MODULES="vfio vfio_iommu_type1 vfio_pci vfio_virqfd"
MODULES="vfio_pci vfio vfio_iommu_type1 vfio_virqfd"
# BINARIES
# This setting includes any additional binaries a given user may
# wish into the CPIO image.  This is run last, so it may be used to
# override the actual binaries included by a given hook
# BINARIES are dependency parsed, so you may safely ignore libraries
BINARIES=()

# FILES
# This setting is similar to BINARIES above, however, files are added
# as-is and are not parsed in any way.  This is useful for config files.
FILES=""

# HOOKS
# This is the most important setting in this file.  The HOOKS control the
# modules and scripts added to the image, and what happens at boot time.
# Order is important, and it is recommended that you do not change the
# order in which HOOKS are added.  Run 'mkinitcpio -H <hook name>' for
# help on a given hook.
# 'base' is _required_ unless you know precisely what you are doing.
# 'udev' is _required_ in order to automatically load modules
# 'filesystems' is _required_ unless you specify your fs modules in MODULES
# Examples:
##   This setup specifies all modules in the MODULES setting above.
##   No raid, lvm2, or encrypted root is needed.
#    HOOKS=(base)
#
##   This setup will autodetect all modules for your system and should
##   work as a sane default
#    HOOKS=(base udev autodetect block filesystems)
#
##   This setup will generate a 'full' image which supports most systems.
##   No autodetection is done.
#    HOOKS=(base udev block filesystems)
#
##   This setup assembles a pata mdadm array with an encrypted root FS.
##   Note: See 'mkinitcpio -H mdadm' for more information on raid devices.
#    HOOKS=(base udev block mdadm encrypt filesystems)
#
##   This setup loads an lvm2 volume group on a usb device.
#    HOOKS=(base udev block lvm2 filesystems)
#
##   NOTE: If you have /usr on a separate partition, you MUST include the
#    usr, fsck and shutdown hooks.
HOOKS="base udev autodetect modconf block keyboard keymap filesystems"

# COMPRESSION
# Use this to compress the initramfs image. By default, gzip compression
# is used. Use 'cat' to create an uncompressed image.
#COMPRESSION="gzip"
#COMPRESSION="bzip2"
#COMPRESSION="lzma"
#COMPRESSION="xz"
#COMPRESSION="lzop"
#COMPRESSION="lz4"

# COMPRESSION_OPTIONS
# Additional options for the compressor
#COMPRESSION_OPTIONS=
1 Like

Hmm in this case, check again if you dont have any extra devices in the same IOMMU group and bind all of them in vfio:

options vfio-pci ids=1022:1452,1022:1453,1002:1478,1002:1479,1002:731f,1002:ab38

1 Like
IOMMU Group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 0 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 0 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 0 01:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Black 2018/PC SN720 NVMe SSD [15b7:5002]
IOMMU Group 0 03:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
IOMMU Group 0 03:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
IOMMU Group 0 03:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01)
IOMMU Group 0 20:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU Group 0 20:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU Group 0 20:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
IOMMU Group 0 22:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
IOMMU Group 0 25:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c1)
IOMMU Group 0 26:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
IOMMU Group 0 27:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c1)
IOMMU Group 0 27:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]

this are all devices in Group 0. do i have to bind all of them? i tried the ids you posted but still the same error

1 Like

Wow that’sa lot of stuff landed in the group 0.

Looks like you would need to switch back to where it was before ( it only showed few devices under same group).

Alternatively, you would need to look into pcie_acs_override

In previous slot you’ve had:

IOMMU Group 2 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 2 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 2 26:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c1)
IOMMU Group 2 27:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
IOMMU Group 2 28:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c1)
IOMMU Group 2 28:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]

And only the two devices didn’t seem to belong directly to the card.

IOMMU Group 2 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 2 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]

I would try moving it back to previous slot, then place all of those in vfio and if that’s not working or it’s causing some trouble then you would need to look into splitting IOMMU

You can also add the pcie_acs_override=downstream to kernel boot commandline to see if its going to split your IOMMU groups better.

If none of the above worked then have a look here:

1 Like

so i added these 4 ids to vfio but it still does not work :frowning:

The link you posted looks very complicated.

I don’t understand it any more: it worked fine with my old (almost identical) motherboard, why does it suddenly no longer work with the MAX? This is very frustrating

1 Like

So now your VM boots but you get error 43 ?

Almost* identical motherboard doesn’t mean identical and many things can be differently implemented/unsupported out of the box.

Can you run below and send the output as well?:

find /sys/kernel/iommu_groups/ -type l

lspci -v|grep "00:03\|26:\|27:\|28:"

If it comes down to pcie_acs_override it’s actually simpler than it looks, mostly it comes down to installing ACSO enabled kernel and enabling boot parameter.

PS, I’ve had a board where I was fighting with it to get out of issues with nvidia and at last booting of ACSO kernel worked for me.

1 Like

the VM is booting but i get error 43.

Almost* identical motherboard doesn’t mean identical and many things can be differently implemented/unsupported out of the box.

I was toying with the idea of getting the identical (old) motherboard from somewhere.

# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/7/devices/0000:00:18.3
/sys/kernel/iommu_groups/7/devices/0000:00:18.1
/sys/kernel/iommu_groups/7/devices/0000:00:18.6
/sys/kernel/iommu_groups/7/devices/0000:00:18.4
/sys/kernel/iommu_groups/7/devices/0000:00:18.2
/sys/kernel/iommu_groups/7/devices/0000:00:18.0
/sys/kernel/iommu_groups/7/devices/0000:00:18.7
/sys/kernel/iommu_groups/7/devices/0000:00:18.5
/sys/kernel/iommu_groups/5/devices/0000:00:08.0
/sys/kernel/iommu_groups/5/devices/0000:2a:00.3
/sys/kernel/iommu_groups/5/devices/0000:00:08.1
/sys/kernel/iommu_groups/5/devices/0000:2a:00.2
/sys/kernel/iommu_groups/5/devices/0000:2a:00.0
/sys/kernel/iommu_groups/3/devices/0000:00:04.0
/sys/kernel/iommu_groups/1/devices/0000:00:02.0
/sys/kernel/iommu_groups/6/devices/0000:00:14.3
/sys/kernel/iommu_groups/6/devices/0000:00:14.0
/sys/kernel/iommu_groups/4/devices/0000:00:07.0
/sys/kernel/iommu_groups/4/devices/0000:29:00.2
/sys/kernel/iommu_groups/4/devices/0000:29:00.0
/sys/kernel/iommu_groups/4/devices/0000:00:07.1
/sys/kernel/iommu_groups/4/devices/0000:29:00.3
/sys/kernel/iommu_groups/2/devices/0000:00:03.1
/sys/kernel/iommu_groups/2/devices/0000:26:00.0
/sys/kernel/iommu_groups/2/devices/0000:28:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:03.0
/sys/kernel/iommu_groups/2/devices/0000:28:00.0
/sys/kernel/iommu_groups/2/devices/0000:27:00.0
/sys/kernel/iommu_groups/0/devices/0000:03:00.0
/sys/kernel/iommu_groups/0/devices/0000:20:00.0
/sys/kernel/iommu_groups/0/devices/0000:25:00.2
/sys/kernel/iommu_groups/0/devices/0000:00:01.0
/sys/kernel/iommu_groups/0/devices/0000:25:00.0
/sys/kernel/iommu_groups/0/devices/0000:01:00.0
/sys/kernel/iommu_groups/0/devices/0000:03:00.1
/sys/kernel/iommu_groups/0/devices/0000:00:01.3
/sys/kernel/iommu_groups/0/devices/0000:25:00.3
/sys/kernel/iommu_groups/0/devices/0000:00:01.1
/sys/kernel/iommu_groups/0/devices/0000:25:00.1
/sys/kernel/iommu_groups/0/devices/0000:22:00.0
/sys/kernel/iommu_groups/0/devices/0000:20:01.0
/sys/kernel/iommu_groups/0/devices/0000:20:04.0
/sys/kernel/iommu_groups/0/devices/0000:03:00.2

# lspci -v|grep "00:03\|26:\|27:\|28:"
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode])
26:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c1) (prog-if 00 [Normal decode])
27:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (prog-if 00 [Normal decode])
28:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (rev c1) (prog-if 00 [VGA controller])
28:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio

I’m always a little afraid to play around with kernels. Afterwards I destroy my running OS.

1 Like

You can you check if you have an option to enable ACS in the bios?

Looks like someone had same problem with your board here: https://forums.unraid.net/topic/95057-second-gpu-passthrough-solved/

They’ve ended up purchasing new board…

I would suggest the ACSO patch …
If you would be on something like ubuntu there are pre-build kernels with ACSO patch, for manjaro you’ll need to have a bit more fun with it :wink:

1 Like

MSI B450 TOMAHAWK MAX

i cant find ACS in the BIOS :frowning:

so i have two options:

buying a new board or ACSO patching?

1 Like

Looks like it …

For manjaro there is plenty of tutorials on how to do ACSO kernel patch ( there is one on this forum)
For ubuntu there is a script which I have used before : https://gist.github.com/mdPlusPlus/031ec2dac2295c9aaf1fc0b0e808e21a

If I remember correctly manjaro is arch based so :

https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Bypassing_the_IOMMU_groups_(ACS_override_patch)

You will need a kernel with the patch applied. The easiest method to acquiring this is through the linux-vfio AUR package.

So it should be as easy as yaourt -S linux-vfio

After you install and boot out of patched kernel linux-vfio then check for your IOMMU groups again, they should be now separated.

1 Like

sorry, forgot to mention that it is arch based! yes, i found the AUR package for my Kernel :slight_smile:

The comments already look good

i’ll give it a try!

2 Likes

so, i compiled the new kernel with the patch, but if i try to boot with this patched kernel i get no output of my RTX 2080 :pensive:

maybe this part is missing

where do i put this lines? in grub?

1 Like

I’m not sure in manjaro but you can do it at boot level. At grub menu you can hit “e” add the line and hit F10 to boot.

https://wiki.archlinux.org/index.php/Kernel_parameters

It all depends on what your manjaro is using

1 Like

Manjaro use Grub:

do i just add pcie_acs_override=downstream,multifunction or the whole

    pcie_acs_override =
            [PCIE] Override missing PCIe ACS support for:
        downstream
            All downstream ports - full ACS capabilties
        multifunction
            All multifunction devices - multifunction ACS subset
        id:nnnn:nnnn
            Specfic device - full ACS capabilities
            Specified as vid:did (vendor/device ID) in hex
1 Like

I would replace the quiet string with pcie_acs_override=downstream

The quiet string suppresses the full boot output so you might want to remove, also I would first try with downstream only. Then don’t forget to grub-mkconfig -o /boot/grub/grub.cfg

After you boot from it :

  • grep dmesg for acs
  • cat /proc/cmdline
  • run find command above to show iommu groups.
1 Like

i still get no video output from my 2080 with the vfio kernel :frowning:

maybe the kernel block the Nvidia GPU?

1 Like

Read above, without providing much log there’s no way to help.

Also, I thought you were passing through the AMD card.

1 Like

yes, i passing through a AMD card but i get no output on my host machine :smiley:

that was what i mean with no video output with rtx 2080. my host machine runs with the 2080. if i select the vfio kernel from grub i get no video output on my host system. so i cant get through the logs.

1 Like