This might help you:
Step 1: Enable IOMMU
Configuring the grub
Open the grub
config file:
nano /etc/default/grub
Change the GRUB_CMDLINE_LINUX_DEFAULT
line to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on"
or
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
PT Mode
Both Intel and AMD chips can use the additional parameter iommu=pt
, added in the same way as above. This enables the IOMMU translation only when necessary, and can thus improve performance for PCIe devices not used in VMs.
Save the changes, update grub
and reboot.
update-grub
reboot
Step 2: VFIO Modules
Add VFIO Kernel Modules
We have to make sure the following modules are loaded. This can be achieved by adding them to /etc/modules
.
Open the /etc/modules
file:
nano /etc/modules
Add the following content to the /etc/modules
file:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
vfio_amd
vfio_nvidia
Then save and exit.
Update initramfs
After changing anything modules related, we need to refresh our initramfs
. The reboot
to bring the changes into effect.
update-initramfs -u -k all
reboot
Verify that IOMMU is enabled
dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
This should display that IOMMU
, Directed I/O
or Interrupt Remapping
is enabled.
Depending on hardware and kernel the exact message can vary.
Step 3: IOMMU Interrupt Remapping
It is not possible to use PCI passthrough without interrupt remapping.
All systems using an Intel processor and chipset that have support for Intel Virtualization Technology for Directed I/O (VT-d), but do not have support for interrupt remapping will see such an error. Interrupt remapping support is provided in newer processors and chipsets (both AMD and Intel).
Identify Interrupt Remapping support
Identify if the system has support for Interrupt Remapping:
dmesg | grep 'remapping'
If we see one of the following lines then Interrupt Remapping is supported:
AMD-Vi: Interrupt remapping enabled
-
DMAR-IR: Enabled IRQ remapping in x2apic mode
(x2apic
can be different on old CPUs, but should still work)
If we at some point encounter the following error e.g. it means that we have an Interrupt Remapping issue:
Failed to assign device "[device name]": Operation not permitted
or
Interrupt Remapping hardware not found, passing devices to unprivileged domains is insecure.
If the system doesn’t support Interrupt Remapping, we can allow unsafe interrupts with:
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
Step 4: Verify IOMMU Isolation
For PCI passthrough to work, we need a dedicated IOMMU group for all PCI devices we want to assign to a VM.
Verify that we have dedicated IOMMU groups by running:
find /sys/kernel/iommu_groups/ -type l
Our output should look something like this:
/sys/kernel/iommu_groups/23/devices/0000:08:00.1
/sys/kernel/iommu_groups/13/devices/0000:00:08.2
/sys/kernel/iommu_groups/31/devices/0000:0c:00.0
/sys/kernel/iommu_groups/31/devices/0000:0c:00.1
/sys/kernel/iommu_groups/3/devices/0000:00:02.0
/sys/kernel/iommu_groups/21/devices/0000:03:05.0
Step 5: Blacklisting Drivers
We don’t want the Proxmox host system utilizing our GPU(s), so we need to blacklist the drivers.
Depending on our GPU card(s) bendor, run the following:
NVIDIA
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nvidiafb" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf
AMD
echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf
echo "blacklist amdgpu" >> /etc/modprobe.d/blacklist.conf
Step 5: Determine the PCI Card Address
Locate the PCI card(s) using lspci
. The address should be in the form of 01:00.0
lspci -nn
To locate AMD spesific cards, use the following command:
lspci -nn | grep Radeon
To locate NVIDIA specific cards, use the following command:
lspci -nn | grep NVIDIA
Our shell window should output a bunch of stuff. Look for the line(s) that show the video card. It’ll look something like this:
0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] [10de:1e07] (rev a1)
0b:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio Controller [10de:10f7] (rev a1)
0b:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Controller [10de:1ad6] (rev a1)
0b:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 UCSI Controller [10de:1ad7] (rev a1)
Make note of the first set of numbers (e.g. 0b:00.0 and 0b:00.1). We’ll need them for the next step.
Run the command below. Replace 0b:00 with whatever number was next to your GPU when you ran the previous command:
lspci -n -s 0b:00
Doing this should output your GPU card’s Vendor IDs, usually one ID for the GPU and one ID for the Audio bus.It’ll look a little something like this:
0b:00.0 0300: 10de:1e07 (rev a1)
0b:00.1 0403: 10de:10f7 (rev a1)
0b:00.2 0c03: 10de:1ad6 (rev a1)
0b:00.3 0c80: 10de:1ad7 (rev a1)
What we want to keep, are these vendor id codes: 10de:1e07, 10de:10f7, 10de:1ad6 and 10de:1ad7.
Now we add the GPU’s vendor id’s to the VFIO (remember to replace the id’s with your own!):
Step 6: GPU OVMF PCI Passthrough (recommended)
Using OVMF, we can add disable_vga=1
to vfio-pci
module, which try to to opt-out devices from vga arbitration if possible:
echo "options vfio-pci ids=1002:67df,1002:aaf0" > /etc/modprobe.d/vfio.conf
echo "options vfio-pci ids=10de:1e07,10de:10f7,10de:1ad6,10de:1ad7 disable_vga=1" > /etc/modprobe.d/vfio_nvidia.conf
echo "options vfio-pci ids=1002:67df,1002:aaf0 disable_vga=1" > /etc/modprobe.d/vfio_amd.conf
NVIDIA Tips
Some Windows applications like Geforce Experience, Passmark Performance Test and SiSoftware Sandra crash can crash the VM. To avoid this we need to add:
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf
If we see a lot of warning messages in the dmesg
system log, add the following instead:
echo "options kvm ignore_msrs=1 report_ignored_msrs=0" > /etc/modprobe.d/kvm.conf
Step 7: Update and Reboot
update-grub
update-initramfs -u -k all
reboot