Hi All!
Summary
I’m trying to set-up GPU-passthrough on a linux host with KVM so that I might run a Windows guest for gaming. I’ve been having a problem where some IOMMU entries (all belonging to the GPU I wish to passthrough) are being assigned to the vfio bus, while others are assigned to the nvidia-gpu drivers.
Hardware
CPU
- Threadripper 1950X
GPU
- Sapphire RX 480 (AMD, meant for host graphics)
- MSI Duke RTX 2080 (nVidia, meant for guest graphics)
MoBo
- MSI Carbon Gaming Pro AC x399
Problem
When I try to passthrough my nVidia GPU, I get the error Please ensure all devices within the iommu_group are bound to their vfio bus driver.
(see images below).
When I run lspci -k
, I find the following records:
0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GV104 [GeForce GTX 1180] [10de:1e87] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3721]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
0b:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f8] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3721]
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
0b:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad8] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3721]
Kernel driver in use: xhci_hcd
0b:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad9] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3721]
Kernel driver in use: nvidia-gpu
Kernel modules: i2c_nvidia_gpu
41:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X] [1002:67df] (rev c7)
Subsystem: PC Partner Limited / Sapphire Technology Radeon RX 470/480 [174b:e347]
Kernel driver in use: amdgpu
Kernel modules: amdgpu
41:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 580] [1002:aaf0]
Subsystem: PC Partner Limited / Sapphire Technology Ellesmere [Radeon RX 580] [174b:aaf0]
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
Which reports that the RTX 2080 is being used by vfio-pci on 0b:00.0
, and 0b:00.1
, but 0b:00.2
and 0b.00.3
(which all belong to the RTX 2080) are being used by xhci_hcd
and nvidia-gpu
.
My understanding is that this is the issue.
What I’ve done
I mostly referred a Level1 thread and a blog post by Jack Ford (which I’d link to if I had the user permissions):
Blacklisted Nouveau
in file /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
Updated GRUB
In /etc/default/grub
:
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=on vfio-pci.ids=10de:1e87,10de:10f8,10de:1ad8,10de:1ad9"
GRUB_CMDLINE_LINUX=""
Added kernel modules
In /etc/modules
:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
In /etc/modprobe.d/vfio.conf
:
options vfio-pci ids=10de:1e87,10de:10f8,10de:1ad8,10de:1ad9
In /etc/modules-load.d/vfio-pci.conf
:
vfio-pci
Confirmed IOMMU/VFIO is enabled
$ dmesg | grep -E "DMAR|IOMMU"
[ 0.946794] AMD-Vi: IOMMU performance counters supported
[ 0.946855] AMD-Vi: IOMMU performance counters supported
[ 0.976135] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[ 0.976139] AMD-Vi: Found IOMMU at 0000:40:00.2 cap 0x40
[ 0.977856] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[ 0.977873] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
[ 1.367168] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <[email protected]>
$ dmesg | grep -i vfio
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.0.0-37-generic root=UUID=27ff4f46-db04-4ca0-8981-191a45c365ad ro quiet splash amd_iommu=on vfio-pci.ids=10de:1e87,10de:10f8,10de:1ad8,10de:1ad9 vt.handoff=1
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.0.0-37-generic root=UUID=27ff4f46-db04-4ca0-8981-191a45c365ad ro quiet splash amd_iommu=on vfio-pci.ids=10de:1e87,10de:10f8,10de:1ad8,10de:1ad9 vt.handoff=1
[ 2.667765] VFIO - User Level meta-driver version: 0.3
[ 2.675583] vfio-pci 0000:0b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[ 2.692215] vfio_pci: add [10de:1e87[ffffffff:ffffffff]] class 0x000000/00000000
[ 2.712193] vfio_pci: add [10de:10f8[ffffffff:ffffffff]] class 0x000000/00000000
[ 2.712199] vfio_pci: add [10de:1ad8[ffffffff:ffffffff]] class 0x000000/00000000
[ 2.712204] vfio_pci: add [10de:1ad9[ffffffff:ffffffff]] class 0x000000/00000000
[ 4.960496] vfio-pci 0000:0b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
Conclusion
Thanks so much for helping out… I don’t know what those two other entries are in the nVidia IOMMU group, but they all appear to belong to the nVidia device.