7950X3D + 4090 + ProArt X670E - semi-broke iGPU when testing passthrough?

I decided to test if I could run Linux as the host OS and Windows as a guest on this machine I have, tested things out first on a spare machine to see the ideas I had were overall viable, and then proceeded to test on this specific hardware.

First, the goal was just to get Windows in a VM with my 4090 passed through to it, leaving the iGPU to the host system for Linux desktop use. This part was pretty easily accomplished. Secondly, I needed some USB for VR etc. on the VM, so I had bought a USB controller PCIe card. When testing it on the other hardware it worked flawlessly, but on this one it ends up in a massive IOMMU group with a dozen other devices that I simply cannot pass through.

I however identified a number of USB controllers that are in their own simple IOMMU groups, identified the ports they are hooked to on the back and front panels, and decided it seems like a viable alternative. I tried to pass them through to a VM and the moment I hit start, the entire system locked up with a black screen.

Multiple reboots - system does not post.

When I clear the CMOS it posts, but as soon as I apply basic BIOS settings again (EXPO, IOMMU, SR-IOV, maybe one or two other such basic things) there’s no display out on the iGPU. Multiple attempts to do all kinds of things to it haven’t resolved this issue.

It seems now that if I have a display plugged into the iGPU there will be no POST of any kind, I’ve got display outs connected to the HDMI out on the iGPU, the HDMI out on the 4090, and DisplayPort on the 4090, none of them get any output if the iGPU is connected except for a fully cleared CMOS state.

I’ve tried to the iGPU to “force” in BIOS, I’ve tried to take out all the power from the system, pull the battery, hold the CMOS reset and power buttons for a while, leave it sitting on its own for ~15min or so, and I’m pretty low on ideas of what can I even try.

If I just leave the displays connected to the 4090, it works fine, but this is not a viable configuration for Linux so I’m stuck with using a Windows install for now.

Any clues what happened to the iGPU, and how I can reset it or something?

My remaining ideas are pretty limited, I’ll try to re-flash the BIOS on the board, and hope that resets something, and if it doesn’t I’ll try to clear CMOS, and reboot without changing any BIOS settings, and then apply one setting change per reboot until it fails to figure out what is the root cause of the issue.

All the semi-relevant details:

  • CPU: 7950X3D
  • Mobo: ASUS ProArt X670E
  • GPU: Gigabyte 4090 Gaming OC
  • Linux: Elementary OS - based on Ubuntu 22.04, with ZFSBootMenu, root on encrypted ZFS
  • 3 NVMe SSDs installed
  • USB card is sold by a random brand but hardware is ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller [1b21:2142]

You mentioned iGPU “force”, do you also have iGPU as the primary one?

If you keep everything default except the iGPU setting, what happens?

PS: I have the same board and found out that some of the USB controllers cause reboots but two of them I can actually pass through. I commented on that in this thread .

PPS: I assume you are using the latest BIOS?

Yep, I tried to set the iGPU as the primary indeed, it’s one of the settings I forgot to mention.

I was earlier using a couple of weeks old BIOS but decided to install the latest BIOS when I noticed the absolutely massive IOMMU group.

I’ll test the BIOS settings further in a few moments.

So, since the issue started I booted to Windows, downloaded the BIOS, and flashed BIOS again. Plugged in separate screens to the iGPU HDMI out, 4090 HDMI out, and 4090 DP out. Went through every setting I configure one by one.

  • Ai Overclock Tuner → EXPO I = Boots
  • SR-IOV Support → Enabled = Boots
  • ErP Ready → Enable(S4+S5) = Boots
  • LED lighting to Stealth mode = Boots
  • IOMMU → Enabled = Boots
  • Download and install ASUS malware → Disabled, Setup mode → Advanced = Boots
  • Primary Video Device → IGFX Video = Boots
  • Integrated Graphics → Force = BOOTS!

Seems like either booting to Windows, or flashing the BIOS, fixed it for this step at least. Next up I guess I’ll need to check which of those USB controllers crash the system.

2 Likes

Don’t forget to tell ASUS of your findings, and you may also report them here too… :wink:

Good luck

On my system I get crashes when passing

IOMMU Group 27 6d:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b8]

The others worked, though for one of them I couldn’t find any associated ports, so I guess it has the (unused by me) USB 2 headers.

1 Like

On mine passing this crashes:

IOMMU Group 27 6e:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b8]

These are passed through successfully:

IOMMU Group 18 6a:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f7] (rev 01)
IOMMU Group 24 6d:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b6]
IOMMU Group 25 6d:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b7]

With these passed through, in the rear panel I get 3 physical USB-A ports passed through, and on my front (top) panel the 2x USB-A are passed through as well, none of the USB-C on front or back are passed through.

I believe that means the USB 3.0 header on the mobo:

And these rear ports seem to be passed through:

x670e rear

2 Likes

The crash from passing this:

IOMMU Group 27 6e:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b8]

However seems to have caused the prior issue I had again, no display out on the iGPU. However, I brought an extra monitor in to ensure all ports were constantly hooked to a display I can see, and I switched the HDMI from 4090 and iGPU around and the other monitor showed display out fine on iGPU.

Now if I only knew how to restore the display out to my main display from that :smile:

Also now I can’t seem to pass the 4090 to the VM again, but I imagine that is an easier problem to solve than all the others I’ve run into so far. If I attempt, all the display outs just stay on a black screen, including the Spice video out.

One more piece of the puzzle seems to be that if I boot into Windows, all 3 outputs (iGPU HDMI, 4090 HDMI, 4090 DisplayPort) work fine. Seems booting to Windows is the step I need to do to restore ability to get iGPU output to my normal main display.

Next step forward in debugging tells me that I can fix the iGPU output after I boot into any OS with it and ensure I plug in my main display to the iGPU. Seems like the crash leaves the iGPU in a strange state and it needs the drivers to restore something.

Also it seems my 4090 PCI IDs changed and I didn’t notice. Probably the BIOS update caused this. Updating the vfio IDs so nouveau etc. wouldn’t bind to it wasn’t enough to get past the black screen though.

1 Like

That is a strange issue, I don’t think I encountered that, but I might just not have noticed? Happy you have everything working now!

Well I wouldn’t say everything is working, I’ve tried a few dozen things and I just can’t get the VM to boot with the 4090 attached. It gets stuck with a black screen on all outputs.

Which Kernel are you using? Do you have rebar enabled? Could you try with rebar disabled?

I had some issues with rebar enabled on kernel version 6.1; but they seem solved now. I’m not sure if it was solved thanks to a kernel update or something else.

This is based on Ubuntu 22.04 so 5.19.0-38-generic. Rebar is enabled, I could try without I guess.

So, disabling rebar makes the VM bootable - thanks for the suggestion. However it seems that either with rebar off, or with the GPU passed through, a new issue arises again. 2 of the USB controllers that I could previously pass through safely, now cause a reboot.

IOMMU Group 24 6d:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b6]
IOMMU Group 25 6d:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b7]

Oddly thought I only lost access to 1 port, so by digging up a USB hub I managed to get everything connected, so it seems I’m now able to run the machine at least.

The rebar however would be very valuable to get running, any ideas on what’s the issue with that?

I’m not 100% sure about what’s going on with rebar; there is support depending on which kernel version and qemu version is available. It seems kernel 6.1 added support, but to get rebar on a passed through GPU you need qemu support too. There might also be some issue with address space if you have a lot of VRAM. On my system (fedora 37 w/ kernel 6.2 now) I have rebar support on the guest, but on 6.1 I needed to set the bar size manually, like described in the arch wiki. I’m not sure when it started working, it could have been a qemu update or a kernel update…

1 Like

Ok well I’m on an Ubuntu 22.04 based distro so on 5.19, if 6.x ever comes to this it will take quite some time still.