Guest GPU: Gigabyte 7900 XTX Gaming OC (no USB-C slot on the card, so only 2 devices are present (the video and the audio one))
Host GPU: none
CPU: 5950X (one CCD for host, one for guest)
MoBo: Gigabyte X570 Aorus Master
VM storage device: dedicated passed-through NVMe SSD
Resizable BAR: On
Boot GPU: Any (1st, 2nd or 3rd PCIe slot; does not matter)
Passed-through IOMMU Groups:
GPU video device group
GPU audio device group
Motherboard USB controller group
Motherboard sound device group
Dedicated NVMe SSD for VM group
VM config: originally created for passthrough of a GTX1080, which worked without issues. A GT1030 was also used as a host GPU.
Radeon Migration:
The GT1030 has been removed. My initial tests use no host GPU, while another machine allows interaction with the host through SSH.
Initially only swapped the IDs of the GPU and its sound device in the bootloader entry file and the VMâs xml. This has resulted in the American Megatrends and a couple of lines of systemd being printed during the hostâs boot, after which nothing. Starting up the VM removed the previous output and set the monitor to sleep. This was presumably caused by the reset bug.
To overcome this I have unplugged the display cable from the GPU, rebooted the host, started the guest again and then plugged the display cable back into the GPU. This brought me every time to a Windows recovery menu, which in all exit cases resulted in a reboot, triggering the reset bug again.
After using a virtual GPU to roll-back Windows to a working state and settings Resizable BAR to OFF, doing the display cable unplug/replug trick again brought me to the Windows desktop, where I have extracted the GPUâs BIOS using GPU-Z.
Adding the VBIOS to the VMâs xml fixed my reset issue. I can freely reboot the VM without issues, howeverâŚ
Single GPU passthrough also works without me setting it up. I boot to the Linux desktop without isolating the GPU, after which I start the VM from virt-manager. My monitor loses signal, then without showing the TianoCore boot screen the VM login screen simply shows up. After shutting down the VM I get dumped back to the tty, where the login prompt awaits. I can proceed to start X again without rebooting the host.
Installation of the drivers inside the VM went without issues. The guest is a bit stuttery, but itâs an old install I experimented on, which has been repaired and rolled back countless times. 3D performance seems about right, in the sense than an eyeballed average in CS:GO went from 350 (native) to 300 (VM), which can be explained by the CPU being virtualized. I did not spend much time testing, nor did I try with a fresh install of Windows.
My to-do list consists of proper performance and stability testing with a fresh install and trying to enable Resizable BAR.
Hope this helps.
Correct, this is the AMD reset issue, this has already initialised the GPU hardware and the guest BIOS will attempt to do it a second time and as the GPU canât be reset back to itâs pre-boot state, itâs like trying to start your car when itâs already running⌠bad things happen.
Sorry but again, this is far too late, the GPU BIOS executed at your system POST regardless of if a cable was attached or not.
The easiest way I have found to prevent this happening is to ensure your AMD GPU has only a EFI BIOS (pretty much all do) and boot the system in compatibility mode (PC-BIOS, non-UEFI, or CSM mode), which prevents the GPUâs BIOS from being executed as itâs incompatible with EFI bios images.
If youâre lucky some BIOSâs will actually allow you to disable posting a device from itâs bios entirely based on slot, usually you only see that though on server/workstation motherboards.
Note though that once you do this, the guest can only be run the first time, if the guest crashes or is shutdown you likely will need to cold boot your system again to get your VM working again.
Finally, you could try the vendor-reset project if you have not already which attempts to reset the GPU using AMDâs internal reset mechanisms extracted from the amdgpu driver. Itâs not 100% but for many people it is enough.
Thanks to all three gents replies. I believe I see sufficient âdmesgâ outputs to validate my thought. A quick summary:
Zen 1 does support âIOMMU AVICâ. Congrats to Zen 1/Zen+ owners.
Zen 4 seems to support âIOMMU AVICâ as well. Lucky you.
Zen 3 seems not support âIOMMU AVICâ unfortunately.
So a conspiracy theory from me: I guess something âbadâ happened in the development cycles of Zen 2 and Zen 3 which seemed to be worked on in parallel. âIOMMU AVICâ was disabled in the final shipping products.
Here is a quick trick for determination. If you see âVirtual APIC enabledâ, then âIOMMU AVICâ is supported by your hardware.
For the hardcore, if you see both âGAâ and âGA_vAPICâ flags, then âIOMMU AVICâ is supported by your hardware. Missing one of the two, itâs not supported.
What line are you missing? Perhaps paste your âdmesg|grep AMD-Viâ output to clarify your question.
IOMMUv2 support PCIe devices that have âPCI PRI and PASID interfaceâ functionality. I havenât looked into what that exactly means. But if your devices donât support them, then you arenât missing anything.
IOMMUv2 is a kernel build option, most distributions seem to have it enabled by default. So if your hardware supports it, then itâll be used.
All Zen processors support IOMMUv2. You donât see the IOMMUv2 line in my paste above because I had it disabled in my custom kernel early this week for the investigation of this AVIC thing. Iâll turn it back on when I do my next build.
IOMMU has to be set to âenableâ in UEFI. Otherwise, you canât do VFIO passthrough.
Hard to imagine passthrough performance being any better on my 5950X but I guess thereâs always something. Perhaps it would help with nested virtualization, which has been a pain point for me.
Also donât forget your Zen 3 or 5950x still benefit from âSVM AVICâ You should enable AVIC in your QEMU config & etc. I believe it still benefits quite a bit to Linux and Windows guests.
Edit:
Here is a brief guide [0].
Should ignore the âcpu_pm=onâ and âpreempt=voluntaryâ bits since theyâre irrelevant to AVIC enablement.
It looks like X2APIC is not only for servers with more than 256 threads, it also seems to be a requirement for ROCm
Known Impact
If AMD ROCm is installed, the system may report failure or errors when running workloads such as *bandwidth test*, *clinfo*, and *HelloWord.cl*. Note, it may also result in a system crash.
* IO PAGE FAULT
* IRQ remapping doesnât support X2APIC mode
* NMI error
In a correct ROCm installation, the system must not encounter the errors mentioned above.
add âamdgpu.sg_display=0â to your grub config and works just fine.
I use the ZEN4 iGPU for the host and replaced my 6800XT with a 7900XTX, I didnât have to change anything, except the 7900XTX needs a VBIOS copy via libvirt
Yes, Iâve been running with the avic and stimer Hyper-V enlightenments enabled for quite some time. Have you tried Maximâs patch for (EDIT: actually looks like itâs been mainlined) kvm_amd.force_avic?
ETA: Hmm, no change here. So that option must force the SVM AVIC, which isnât necessary on my 5950X. I wonder if we could force the IOMMU AVIC as wellâŚ
Then youâre all set wrt AVIC. With all-core passthrough, I bet you may be able to benchmark a difference in GB6 or CR23 perhaps.
I think all Zen processors can do âSVM AVICâ. My Zen 2 does it too (which I wasnât aware of until I figured out this mess lately).
Whatâs missing for Zen 2 and Zen 3 are the âIOMMU AVICâ. It cannot be forced on because itâs missing the hardware feature (as per recent kernel codes).
Yes, I understand, but it might be possible to bypass the hardware check and enable it regardless. Assuming the feature is implemented in hardware and not fused off or disabled in firmware.
Thatâs a fairly easy problem to solve. But it could be an expensive solution depending on how you have to do it. First, do you want 10 NVMe or SATA SSDs? If SATA, you just need to get a tower case that was designed to be useful, not one that is solely made to show off tons of RGB LEDs through glass panels. YES, there are still some cases like that being made; you just have to shop around a bit more to find them. You might not be able to find any with 10 drive bays. But there are 5.25" bay adapters that allow you to put 4,6, or 8 SATA SSDs into one 5.25" bay. I have a 4-in-1 in my system with 4 SSDs. I was lucky that I got it on a REALLY good Newegg sale many years ago. The regular prices of these units are expensive, but if you are really going to use it then it will be worth it. Then of course you have the problem of having enough SATA connections. Most mobos top out at 6 these days with many having 4 or even 2! So you will need to get one or more PCIe cards that give you the number of SATA ports you need. My mobo has 6 SATA so I only needed to buy one PCIe SATA card with 4 ports. If you want to have 10 NVMe SSDs thatâs definitely trickier. Iâll assume your mobo supports two. Ideally you would want to have a mobo that has 2 x16 slots free. Of course very few mobos have 3 x16 slots, since you need the first one for your gfx card. If this is going to just be a media server and not a gaming PC then you just freed up a x16 slot. There are PCIe cards that support up to 4 NVMe SSDs on a card that fit into those slots. I presume they will also function in x8 and x4 slots, but you would not get the full speed of the drives if you were using more than one at a time in a x4 slot, or two in an x8 slot. And if you buy Gen 4 SSDs and put them into Gen 3 slots you will not get the drives rated speed no matter how few drives are being used at a time. And these cards are not cheap either. So there you are. Itâs 100% doable, you just have to pick which of the two paths you want to take and figure out if you can afford to do it. Good luck!
Yeah, sure. Iâm not worried about it, just wondering what is the ânormalâ. Next reboot Iâll try to remember to delve into BIOS and set it to Enabled instead of Auto and see if anything changes.
My Windows passthru VM has already 20+ days of uptime working flawlessly, so on the other why change something that worksâŚ