[Solved]7900XTX gpu passthrough issues

Hey, i had a single gpu passthrough working on my setup using a 6800XT, i replaced the GPU with a 7900XTX and now i can’t get the passthrough to work properly, i’m able to connect to my pc through VNC and see that the GPU is recognized on windows with all drivers installed but i can’t see anything on my monitor, i get the audio but just a black screen. If i go on windows device manager i cannot see my monitor as an option. It’s as if the screen itself wasn’t recognized. Also, when i turn off the VM the pc seems to freeze, i can’t turn it off even pressing the button once or requesting a poweroff through SSH. I can do everything else through SSH though so not sure what’s going on there.
So far i’ve tried:

  • building a new box from scratch
  • setting the kernel parameters suggested on arch wiki
  • unplugging my monitor until the VM boots (as suggested there as well)
  • dumping my ROM and referencing it on my xml file
  • removed my kvm switch and connected everything directly
  • disabled (and enabled) rebar
  • dedicated the 7900xtx exclusively to the VM so i don’t need the setup and release scripts
  • various small changes that i don’t remember but haven’t yielded any results. Except now the windows box is extremely slow even on VNC but if i remove the physical cable of my main display it runs smoothly again. Starting a new one is quick though so it’s not an issue.

If anyone has any suggestion, i’m not sure how to debug this any further but i’m willing to try pretty much anything at this point.
One thing to note is that the GPU itself is working, i can play games normally using proton with flawless performance. The rest of my setup is exactly the same as it was when it worked, the only change is the GPU. I have not formatted the PC because i didn’t think it necessary, but i can if anyone thinks it might help.

If there are any other logs that might help, just let me know and i can upload them. I couldn’t find anything relevant on journalctl so i didn’t include it, but if there’s a keyword to search for i can bring it as well.

Solution: Disable the ROM BAR setting on the PCI device:


Machine XML
Bios settings:
SR-IOV: disabled
CSM: disabled
ReBar: enabled
ACS override: none
Iommu: enabled
AER: enabled
amd-VI: enabled
bios: 1.28 (x670e steel legend)

1 Like

Disable rebar? If that fixes it follow my guide the part where you manually set the two rebar regions. If theres a mismatch it won’t work right.

1 Like

It’s disabled, sorry, forgot to list that on the things i’ve tried.
Also tried enabling it, same result. I tore down my start/end scripts and dedicated the 7900XTX exclusively to the VM, same result. I get to hear windows booting (and can get visuals with vnc) but nothing on my screen.
vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
this message is the only error i could get, but doesn’t seem to lead anywhere either.

What’s your kernel line? Any acs override?

Iommu enabled, aer enabled?
What motherboard?

Any clues in the output of dmesg?

Leave rebar disabled for troubleshooting purposes in bios.

1 Like

Linux 6.5.9-arch2-1 x86_64

Not that i know of, unless the kernel i get from arch already came with it.

Yes,

[    0.420132] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    0.476635] AMD-Vi: AMD IOMMUv2 loaded and initialized

nope but i can look into it

x670e steel legend

Just a vfio vgaarb log, but it didn’t seem to lead anywhere:

selhar@selhar ~> sudo dmesg | grep vfio
[    0.000000] Command line: initrd=\amd-ucode.img initrd=\initramfs-linux.img root=PARTUUID=15968a6a-9471-4cf8-88e2-0ce79becfd56 zswap.enabled=0 rw rootfstype=ext4 iommu=pt video=efifb:off video=vesafb:off vfio-pci.ids=1002:744c,1002:ab30
[    0.025700] Kernel command line: initrd=\amd-ucode.img initrd=\initramfs-linux.img root=PARTUUID=15968a6a-9471-4cf8-88e2-0ce79becfd56 zswap.enabled=0 rw rootfstype=ext4 iommu=pt video=efifb:off video=vesafb:off vfio-pci.ids=1002:744c,1002:ab30
[    2.735681] vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[    2.735803] vfio_pci: add [1002:744c[ffffffff:ffffffff]] class 0x000000/00000000
[    2.828525] vfio_pci: add [1002:ab30[ffffffff:ffffffff]] class 0x000000/00000000
[    8.949693] vfio-pci 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none

i applied the suggestions on arch wiki for this error, but it didn’t change anything. Not sure if it might help but here’s the full dmesg logs: selhar@selhar ~> sudo dmesg[sudo] password for selhar:[ 0.000000] Linux v - Pastebin.com

Hey one question, do i have to update the BIOS to use a more recent GPU for passthrough? i always assumed i didn’t but that’s pretty much all that’s left to debug, i was putting it off not to risk messing with the iommu groupings.

Update: I upgraded the bios to the most recent non-beta patch and although my iommu groupings remain the same, i can’t isolate the GPU anymore. Specifying the ‘vfio-pci.ids’ parameter makes me stuck in the udev module loading screen. Making a single gpu passthrough now gives no video as well, even through vnc, although i can hear the windows boot sound and the lag seems to be gone. Will continue debugging and if i find something i’ll post here.

Update2: I was able to get image back using VNC by starting a fresh VM but the behavior stays the same, i get to see things through VNC but it’s as if windows doesn’t recognize my monitor, only the GPU. Not sure what else to try besides formatting my PC, which i really wouldn’t want to since the linux host is what i use for work. Will continue debugging, if i get completely out of options i’ll just format everything.

Update3: On arch wiki i found an article about checking ROM signatures and mine seems to be incorrect, but the techpowerup link doesn’t have anything on the asrock phantom gaming, only the acqua and taichi. @wendell do you know if that might be an issue with the gpu itself? I couldn’t find successful passthroughs with the 7900xtx phantom gaming, only taichi and acqua.
Here is my output:

Valid ROM signature found @0h, PCIR offset 3b4h
	PCIR: type 0 (x86 PC-AT), vendor: 1002, device: 744c, class: 030000
	PCIR: revision 0, vendor revision: 1601
Valid ROM signature found @ba00h, PCIR offset dd77h
Invalid PCIR signature: "H

Sent an email to asrock to know if the phantom gaming supports uefi. Might be at my wit’s end for today, but will continue giving updates in case it helps someone else.

Have you tried other driver than vfio-pci? You could try pci-stub’ing them first, then hand over them to pci-vfio after successful boot to desktop.

Edit: also sounds like if your BIOS update reset your passthrought related settings, so do double check things such as AER, ARI, 4G decoding and whatever your vendor calls VT-x/d.
E2: oh and for debug purposes, just disable SAM / ReBAR until you have that ruled out.

1 Like

nope, should this have an effect between different gpus? it was working with the 6800xt.

yep, already tried that, after updating the bios it stopped working, OS hangs during boot on the early module initialization, couldn’t figure it out why, gives no errors. Before updating the bios it worked though and gave no results (still a black screen).

Update: it was a configuration error on my part, stubs are now working but that didn’t help much, still getting the black screen. But now i plugged in another set of mouse and keyboards so i can debug it further, not much luck there though.

Yeah, i double checked the settings after the new bios, it’s all configured and with rebar off. GPU behavior stays the same. At this point my suspicion is that i might have an issue with the vbios. My partner bought the same model, i might try swapping theirs with mine later this week.

1 Like

After many tweaks, still a black screen, posting the current state of my setup in case anyone has any suggestion that might help:

Bios settings:
SR-IOV: disabled (also tried enabling)
CSM: disabled (also tried enabling)
ReBar: disabled (also tried enabling)
ACS override: none
Iommu: enabled
AER: enabled
amd-VI: enabled
bios: 1.28 (x670e steel legend)

I’m using stubs for the driver, it’s working flawlessly, i cannot find any errors in dmesg nor journalctl. Everything seems like it should be working. Windows boots, i can hear it and access through VNC, but all i get on my DP connection is a black screen. I plugged in another set of keyboard and mouse so i can debug without hard rebooting as well.

Stubs working:

03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX] [1002:744c] (rev c8)
	Subsystem: ASRock Incorporation Navi 31 [Radeon RX 7900 XT/7900 XTX] [1849:5304]
	Kernel driver in use: vfio-pci
	Kernel modules: amdgpu
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel
1 Like

which specific variant? I only have the x670taichi lite at this point but I can do some testing with it.

The phantom gaming sadly, couldn’t find any passthroughs with her online.

Random question here. If you pass through a AMD or NVIDIA GPU successfully to a WindowsVM, CAN you then use HDR10+ on that windows VM or does it still require HDR support on the Linux or VM manager side?

I’ve seen a few posts warning against it, because of that i’ve never tried. Not sure why though.

I fixed it! Found a random post on /r/vfio from a trillion years ago with similar issues that got solved by disabling the “ROM bar” option on the box XML. I did the same here and it just worked. For anyone having the same issues, my BIOS settings remain the same as i last posted.

One thing to note is that if you stub your GPU without setting the integrated gpu as the primary display in your bios settings, you will get stuck on boot (at least on the x670e steel legend).

3 Likes

yes HDR works, If you passthrough a device, the device is not available to the host anymore, means the host has nothing to do with what you do with the device in your VM

Are you using virsh? If so could you post your VM’s configuration? I’m assuming that was what you changed when you mentioned disabling the “ROM bar” option.

I’m using virt-manager’s gui for everything, so if you need something else just tell me the command and i can send it later today. The “Rom bar” option adds this to the xml: <rom bar="off"/>. As for what it does, no idea.

full xml

Enabling HDR with Windows on native hardware was a horrible and buggy experience for me. It skews your performance too and will be the last thing you think to disable.

Thanks! That was what I was looking for.

1 Like

Just an update, but i was able to enable re-bar without an issue. Settings are the same as before, only exception is i enabled rebar on bios then everything worked out of the box. First boot on the VM took slightly longer (a minute or so) but other than that, no issues.