AMD 7800X3D iGPU (Host) Nvidia4090 (VM) Pass-through challenge. :|

SADLY L1TECH Forums can’t handle links with brackets in them.

So I’ve been slowly researching this again, and while I know I probably won’t get the results I want, it be nice to at least get this 4090 working once in pass-through mode so I can say I’ve done it.

Target:
Pass-through a Nvidia4090 to a Win11 Virtual Machine using VFIO IOMMU method and continue using the 7800X3D iGPU for Plasma (Wayland?) desktop on the Linux host while the VM takes over the 4090 and its two connected displays. There is a display connected to the iGPU directly.

A GUIDE I have followed and attempted but didn’t work out is https://gitlab.com/risingprismtv/single-gpu-passthrough/-/wikis/1)-Preparations , however it seems clear this isn’t for the DUO iGPU setup I’m using and appears to mostly be for booting linux purely to boot VM into windows, thus no linux desktop use at any point.

I got some interesting error outputs from that attempt, I’ll list them at the end of this post.

Basic rundown of things done.
BIOS enabled SR-IOV, IOMMU, NXMode, SVMMode
Grub boot has amd_iommu=on, video=efifb:off and iommu=pt

The IOMMU group with the dGPU components:

IOMMU Group 0:
        00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
        00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
        00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
        00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
        01:00.0 Non-Volatile memory controller [0108]: Silicon Motion, Inc. SM2262/SM2262EN SSD Controller [126f:2262] (rev 03)
        02:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102 [GeForce RTX 4090] [10de:2684] (rev a1)
        02:00.1 Audio device [0403]: NVIDIA Corporation AD102 High Definition Audio Controller [10de:22ba] (rev a1)
        03:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. RTS5765DL NVMe SSD Controller (DRAM-less) [10ec:5765] (rev 01)

Libvirt stuff:
Configured /etc/libvirt/libvirtd.conf with unix_sock_group = "libvirt" and unix_sock_rw_perms = "0770", enabled logging. Added user to libvirt group, started/enabled libvirtd service…

Qemu stuff:
Configured /etc/libvirt/qemu.conf with user = "myuser", group = "myuser"

Enabled virsh vm default network, all good and working.

Configured the Virtural Machine Manager with a functional Win11 VM (tested with emulated gpu) All good so far.

Dumped the 4090 rom file with native windows GPU-Z and it appeared valid, did the hex edit showed in the guide.

Installed these scripts https://gitlab.com/risingprismtv/single-gpu-passthrough/-/wikis/7)-scripts-&-logfiles

Hooked up the PCI device in the VMM software, tried to run it within Plasma, no luck. Tried to run it within TTY (no plasma active) and it went to some odd old framebuffer capture of the boot screen and froze.

I have included these log files for reference. There are some errors worth noting.
(uploading soon)

This is all done on CachyOS (ARCH). Log files atm.

custom_hooks.txt (2.3 KB)
libvirtd.txt (2.3 KB)
win11.txt (7.1 KB)

Messages worth noting which appear important.

custom_hooks.txt
/usr/local/bin/vfio-startup: line 140: echo: write error: No such device
modprobe: FATAL: Module nvidia_drm is in use.
modprobe: FATAL: Module nvidia_modeset is in use.
modprobe: FATAL: Module nvidia is in use.
modprobe: FATAL: Module drm_kms_helper is builtin.
modprobe: FATAL: Module drm is builtin.
libvirtd.txt
Lots of Udev property errors, probably unimportant. 
win11.txt (this is the win11 image)
qemu-system-x86_64: warning: This family of AMD CPU doesn't support hyperthreading(2)
Please configure -smp options properly or try enabling topoext feature.
qemu-system-x86_64: -device {"driver":"vfio-pci","host":"0000:09:00.0","id":"hostdev0","bus":"pci.3","addr":"0x0"}: vfio 0000:09:00.0: group 1 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.

This above two errors seems to be a trend, however according to the guide its setup correctly so I'm a little confused here. 

I don’t expect to get this whole VM thing running anytime soon. It eats up time like crazy and it seems EVERY SINGLE VM configuration needs its own customized guide because nobodies hardware is the same. (frustrating indeed)

You might be overly complicating things…

  • there should be no need anymore to dump the VBIOS since at least the 3000 series cards
  • single cpu passthrough is way more complicated and you do not need it since you have the iGPU
  • start with getting the VM running without passthrough before adding passthrough
  • there are a lot of scripts/repos out there that are outdated or for different distros or different gpu models and won’t work without modifications and such
  • Don’t forget to set the iGPU as the default in the bios.

I found the arch wiki very helpful to get going. Start with getting the windows vom running, the follow the arch wiki step by step, that’s my recommendation…

Also make sure to enable IOMMU on bios, your groups are not looking good!

Which board are you using and in which slot is the 4090? You might have it in the wrong slot since the cpu-connected pcie should always have its own isolated IOMMU group! At least on am5…

Why not?

Anything can be URL encoded.

Markup source:
Why not?
* [Link with \[](https://www.eample.com/#[)
* [Link with \]](https://www.eample.com/#])
* [Link with \(](https://www.eample.com/#%28)
* [Link with \)](https://www.eample.com/#\))

Anything can be URL encoded.

Yeah I dunno, but I just used the usual make a hyperlink function. Perhaps that is the problem vs just using markup.

Managed to get this VM working.

The missing link was a few things! One was the scripts were not tailored to my system so caused more troubles then fixed. I will need to write my own using them as reference.

Second is I didn’t pass the GPU to VFIO on boot, this prevents Linux taking control of the GPU.

After this I got the VM working but I DID need to use VCS patch, all mentioned in the ARCH guide/wiki.

Things to still get working is 5.1 jack-sound instead of passing through my USB soundcard which is not ideal as Linux then has no sound device.

AND THEN figure out a clean way to pass the 4090 back to Linux with minimal reset/reboot requirement. The nvidia-propietary drivers are there already.

Another issue my system has is the 7800X3D has a bug where it will reset once or twice on first use after several minutes before becoming stable. This happens in Windows and Linux, it is a weird issue as it isn’t persistent and only a cold boot issue.

I got scream working with 5.1 sound in the end but I had to move back to windows 10 for it to happen.

Note the date had to be set to 2021 to install scream driver (reset afterwards).

Seems windows11 and secboot/tpm and all that are too much of a blocker for scream which is not being updated. You can install it on Win11 but no sound will happen as the service driver can’t broadcast/isblocked whatever.

Got my drives mounted correctly without passing through PCI Host, they now show in both native and vm windows plus Linux as NTFS or whatever.

Below is example of the XML section for one of my drives. If the partition doesn’t show up it means the existing partition was created wrong and needs to be remake INSIDE the VM. I had this issue with at least one of my drives.

Additionally when copying data back and forward some files were damaged and one of my modded games was affected in such a way that GOG Verify data did NOT work and I had to redownload the game (cp77).
Hopefully that is a random rare issue.

<disk type="block" device="disk">
  <driver name="qemu" type="raw" cache="none" io="native" discard="unmap"/>
  <source dev="/dev/nvme3n1"/>
  <target dev="vdb" bus="virtio"/>
  <address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0"/>
</disk>

Another thing that is becoming clear. My 7800X3D iGPU is somewhat unstable and often resets 1 or 2 times a session (day). I read up on this issue and apparently faulty iGPU in AMD’s APU’s is not rare and these sorts of reset issues are a byproduct of that. It happens in Windows native also so certainly isn’t related to software.

Was told only way to fix that is RMA. But I can’t afford to go without a CPU, so I dunno, maybe I’ll need to talk to retailer but generally they just tell you to buy another CPU in meanwhile or wait the months RMA time. Maybe I should just sell it and get a 12core or something.

I would like to setup clipboard sharing if its even possible and from what I’ve done research wise I was something called spice-vdagent.

However I can’t find any confirmation if this is the best solution for a Win10 VM Guest with GPU passthrough or not. My host is running Plasma Wayland btw.

If anyone has any input, please share.

It would be nice if Syngery/Barrier/Input-x? supported Wayland, but they are X11 exclusive atm which is a damn shame given how long Wayland has been around for now.

SO I had to completely rebuild my HOST and GUEST yesterday because I was messing with some performance tweaks and updates and discovered something about BTRFS.

BTRFS has the WORST recovery/repair tools/solutions possible. After a system crash my host btrfs partition died, this was soon after another btrfs partition died (which was mostly empty anyway) . LONG story short, after loads of btrfs research, its terrible for recovery, sure it has some btrfs check and even repair tools (which they warn suck) , but ultimately the partitions become unrecoverable very easily.

I have since moved the host to EXT4 which can repair and recover any sort of data corruption easily on boot.

Basically I was having some stability issues with my iGPU on my 7800X3D but after a dust cleaning and a m/b bios update, it hasn’t crashed. It’s still possible my X3D has a faulty iGPU but I’ll cross that bridge when I come to it. Still a few things I can do to 100% confirm its the iGPU on the X3D chip at fault such as memory checks, reseating the cooler etc…

X3D APU issues are not uncommon if you research it online, many people have had these issues. (the crashing of iGPU only happens if you USE it, which majority of people probably don’t)

PS. Yes the iGPU specifically crashed a lot under native Windows11, but I ignored it up until now. The AMD drivers will tell you the GPU crashed which is how I know.

1 Like

So the BIOS update has for the most part resolved the 7800X3D APU iGPU crashing but introduced a issue with my DD5 memory, specifically I started getting lots of memory errors according to memtest86 which reported 32errors in a 6hr period.

What I have done to resolve this is have a look at the XMP profile and noticed some of the voltage values were a bit below what they should be and have raised VRAM voltage in a few places by a very very small percentage.

Things have since been more stable and hopefully resolved memtest86 errors which I’ve yet to test again (forgot to run it during off-peak time)

The memory errors were causing applications to silently crash, including games. And it was very random which made it frustrating.

OTHER then that issue the virtual machine is running quite well, even icked out a bit more performance from it according to timespy testing, it gets 2k more points in the 3D department compared to Windows 11 native so that’s nice.

Might need to play with settings further for anti-cheat purposes if I run into issues with that down the line as Win10 does say YES next to virtual machine section which might cause issues. I should test tarkov, a game I almost never play.

I’ve had really good luck with audio passthrough using a standard Intel HDA audio device (ich9). Did that not work for you?

Yes that works, but its basic 2 channels. I want my full 5.1 channel setup.

There are a few ways of doing it but atm Scream works fine and I have VoiceMeeter piping audio to my HDMI2.1 as well for capture purposes.

Overall pretty happy. I even found a way to get better auto-hdr from special-k.

Overall this setup will last until Linux gets significantly better HDR and DLSS support. (likely 6months or more away)

I still haven’t messed with the pass-back-to-host for the GPU, but from what I understand that shouldn’t be a huge issue but does require a Plasma desktop restart.

Plasma is set to X11 atm because I’m using Barrier for clipboard and kb+m unification. I wish it worked under Wayland but like MANY applications, WL support is EXTREMELY limited.

1 Like

Posted a bug report here but I don’t have any hopes in solving it.

https://bugzilla.kernel.org/show_bug.cgi?id=218943

I have since installed a DUAL BOOT setup and have moved to Windows for OBS until this bug can be figured out. Happens on AM4 and AM5 systems for me.

I know this capture card does 10Gbps under Linux as I’ve seen a video showing exactly that. All attempts to work around this issue have been tried, nothing can be done atm.

1 Like

UPDATE: Turned off ACS patch on my HOST. Looks like i don’t need it.

I was having a issue with 4090 graphic corruption on screen which looked like dying VRAM issues but didn’t happen under Native Windows, so I turned off ACS and removed pci=acpi kernel options and that seems to have resolved it.
It was probably just ACS patch doing it, and I had the ACPI thing in attempt to fix USB write speeds. (Linux copies in slow bursts via UI and its just a bit annoying, dunno why it can’t async)

The OBS server however is STILL running Windows10 due to 5Gbps bug with my capture card (usb) under Linux. No solution for that atm.

All sound works via SCREAM servers without any issue.

1 Like

It looks like your RTX4090 is not in its own IOMMU group. The IOMMU group is shared with other devices such as NVME. As someone mentioned, make sure the RTX4090 is sloted at the first PCIEx16 slot.
What motherboard do you use?

1 Like

I don’t think it’s just x3d. My 7900 was crashing and giving me artifacts on glmark2, got it RMA’d.

The new one is just crashing sometimes. No blue screen, no logs, nothing. Disabled the igpu and the issue went away. Idk how to trigger it and troubleshooting takes forever since there are so many configuration permutations to try out.

ASROCK PG RIPTIDE 650M

But I have since resolved that original issue and don’t use ACS patch anymore.

AMD is offering me a RMA but I don’t know if I should accept it since it will mean no CPU for 6weeks or more.

Might get a 9000X3D when those appear and just sell this one off, or RMA it if that is still a option.

The fact the iGPU can become completely 100% stable after these crashes means it could be fixed in firmware for the CPU…