Hey all, recently swapped from Intel (an nice old HP Z820) to a Ryzen 3900x, basically porting all the core hardware across. However, I can’t get my passthrough working on windows, even though it was flawless before on the intel chip.
So, I’m running:
3900x / Vega 56 (Sapphire) / Asus WS X570 Pro / Nvidia K600 for host
I’m running a K600 in slot 3, and the Vega 56 in slot 1. The bios is set to slot 3 in the only place I can find.
I get the following error from Qemu on startup, but not every time (?):
Failed to mmap 0000:0d:00.0 BAR 0. Performance may be slow
Windows just hangs on boot (I can see via VNC), and nothing appears on the screen. No idea how to debug this. So, installed ubuntu in a VM, added the GPU. It doesn’t crash, but doesn’t use the GPU. It’s listed in lspci in the vm. In dmesg in that vm, I get the following:
[ 2.704245] amdgpu 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xd0000000 -> 0xdfffffff
[ 2.704246] amdgpu 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xe0000000 -> 0xe01fffff
[ 2.704246] amdgpu 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xfe800000 -> 0xfe87ffff
So, disabling ACPI caused all kinda problems; the machine didn’t boot right as the K600 didn’t get an IRQ somehow. Reverted. Now trying to play with efifb=off / CSM mode.
Well, finally figured out that this card was being put in a D3 state, which stopped things working. I didn’t apply the gnif patch yet - I’m running 5.4…
Using a work around by removing the GPU + sound card part, suspending to ram, then rescanning. It’s kinda annoying…but it “works”.
Any better ideas? Any idea why I didn’t experience this on Intel?
Failed to map rom bar is the clue here. I hit the same problem on my rtx and GTX being in the first slot on Asus mobo. Basically it’s framebuffer’s fault. It is mapped to your first slot GPU and you should be seeing Asus logo on the screen connected to the GPU. Do video=efifb:off in grub or whatever boot parameters you use. After that you shouldn’t see the logo anymore and it should pass to vm without a problem. There are qemu hooks and scripts you can run to unbind framebuffer on vm start if you are interested. This is basically “passing boot GPU problem”
Thanks; trying that now. Think swapping a slot would avoid this? I’m a day ago, I changed to running the K600 in slot 1, and the Vega in slot 2 - didn’t change anything for me.
One last suggestion is to dump vbios from the card and provide that to the vm. Also it would help if you can post .xml config. Your dmesg shows something about vm bioses and looks like wrong one but I could be wrong. Also I just remembered, did you update your Asus mobo to the latest bios? There was a period when AGESA broke passthrough.
I don’t know. If it is actually d3 patch could help but I don’t know. It all sounds very weird on how it behaves. Latest kernel update to my host broke for me Nvidia cards initialization somehow. Now I need to do pci rescan and vfio driver assignment on vm start. Otherwise cards don’t even appear in lspci.
Ya; that’s the only other difference (other than intel -> amd), I’m now running 5.4 not 5.3. I’m gonna try a downgrade and see if that helps - seems weird this bug is specific to Ryzen, and not the GPU.
Yes, maybe downgrading to a known good linux build is a good idea. Latest kernel is absolute bonkers with it’s issues. I am running latest pop os which introduced weird PCI problems and the previous version had absolutely 0 issues. I am also running Ryzen (3950x in this case) on x570 (Asus hero 8) FYI.