So, I tried to setup pci passthrough with qemu and it led to data corruption. I think I need to understand this little better before I try again. I managed to copy some binaries from other working system and managed to reinstall all apt packages, so I think everything is good now. At least everything seems to be working.
System:
ASUS PRIME x570-P
AMD 5900X
RTX 2080 TI (main gpu, nvidia-driver-535)
RX 580 (passthrough, kernel driver in use: vfio-pci)
I have amd_iommu in my grub config kernel parameters:
GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on kvm.ignore_msrs=1 quiet splash"
Although now that I read dmesgs I noticed this line:
[ 0.037921] AMD-Vi: Unknown option - 'on'
Kernel documentation doesnt show “on” value for amd_iommu but it is mentioned in many instructions, so I don’t know if that has changed at some point:
amd_iommu= [HW,X86-64]
Pass parameters to the AMD IOMMU driver in the system.
Possible values are:
fullflush - Deprecated, equivalent to iommu.strict=1
off - do not initialize any AMD IOMMU found in
the system
force_isolation - Force device isolation for all
devices. The IOMMU driver is not
allowed anymore to lift isolation
requirements as needed. This option
does not override iommu=pt
force_enable - Force enable the IOMMU on platforms known
to be buggy with IOMMU enabled. Use this
option with care.
pgtbl_v1 - Use v1 page table for DMA-API (Default).
pgtbl_v2 - Use v2 page table for DMA-API.
irtcachedis - Disable Interrupt Remapping Table (IRT) caching.
source: The kernel's command-line parameters — The Linux Kernel documentation
I did also find these timeouts which I should probably solve before trying this again:
2023-12-06T17:56:30.375476+02:00 badwolf kernel: [49224.144571] iommu ivhd0: AMD
-Vi: Event logged [IOTLB_INV_TIMEOUT device=0000:05:00.0 address=0x10021e600]
2023-12-06T17:56:30.375480+02:00 badwolf kernel: [49224.144577] iommu ivhd0: AMD
-Vi: Event logged [IOTLB_INV_TIMEOUT device=0000:05:00.0 address=0x10021e620]
2023-12-06T17:56:30.515501+02:00 badwolf kernel: [49224.285096] AMD-Vi: Completi
on-Wait loop timed out
2023-12-06T17:56:30.641119+02:00 badwolf kernel: [49224.410727] AMD-Vi: Completi
on-Wait loop timed out
2023-12-06T17:56:30.766553+02:00 badwolf kernel: [49224.536148] AMD-Vi: Completi
on-Wait loop timed out
2023-12-06T17:56:30.892141+02:00 badwolf kernel: [49224.661733] AMD-Vi: Completi
on-Wait loop timed out
2023-12-06T17:56:31.017560+02:00 badwolf kernel: [49224.787154] AMD-Vi: Completi
on-Wait loop timed out
2023-12-06T17:56:31.143093+02:00 badwolf kernel: [49224.912687] AMD-Vi: Completi
on-Wait loop timed out
2023-12-06T17:56:31.268516+02:00 badwolf kernel: [49225.038118] AMD-Vi: Completi
on-Wait loop timed out
IOMMU group 23 shouldn’t have anything else than my RX 580 so that shouldn’t be the problem:
IOMMU Group 23:
05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev e7)
05:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1002:aaf0]
Pci-passthrough kind of works (except when it corrupts my ssd). Also when I start qemu my youtube video freezes for a moment and it usually recovers but not always. Sometimes my keyboard freaks out and stops working and it just blinks the numpad light when I press buttons. So there is some interference going on and things are not isolated properly or something like that.
I have tried different settings but this is the script I’m using now:
qemu-system-x86_64 \
-display default,show-cursor=on \
-cpu host,-hypervisor,kvm=off,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff \
-smp $(nproc) \
-vga qxl \
-m 8G \
-machine q35,accel=kvm \
-drive if=pflash,format=raw,unit=0,readonly=on,file=/home/henrixd/winvm/OVMF_CODE_4M.secboot.fd \
-drive if=pflash,format=raw,unit=1,file=/home/henrixd/winvm/OVMF_VARS_4M.ms.fd \
-drive file=ssd.img,format=raw,media=disk \
-device vfio-pci,host=05:00.0,x-vga=on \
-device vfio-pci,host=05:00.1 \
-device usb-host,hostbus=bus,hostport=port \
-device usb-host,hostbus=1,hostport=3
Any hints? Am I making something obviously stupid here that I’m too dumb to understand?