Issue with IOMMU GPU Passthrough

Hello, I’m trying to get my Dell Precision T5820 running Linux Mint 20 with the distribution-provided 5.4.0 kernel set up with a Windows VM. All virtualization settings are enabled in the BIOS. I’m using a Quadro K2000 as my display output and a GTX 680 for passthrough (will probably be upgrading to a 1660 super at some point), both with the Nvidia Driver 440. However when I follow this guide: (link removed: [Beginner friendly guide to windows virtual machines with GPU passthrough on Ubuntu 18.04]) and get to the part where I’m running update-initramfs and reboot after adding my hardware IDs to /etc/modules and /etc/initramfs/modules, I get these messages on bootup and the GTX 680 still appears in the Nvidia X-server settings:

[0.158931] DMAR: DRHD: handling fault status reg 2
[0.158936] DMAR: [DMA Read] Request device [00:17.0] PASID ffffffff fault ad
dr 40ce1000 [fault reason 06] PTE Read access is not set

There’s also this error that started appearing at some point prior to me messing with all this. It didn’t seem to affect anything at the time so I ignored it, but maybe its related somehow?

[0.546152] Initramfs unpacking failed: Decoding failed

I did some Google searches but the solution to all of them was to turn IOMMU off in the grub configuration file, which is obviously not going to work for me. I have some intermediate Linux experience, but this is my first time messing around with KVM and VFIO. Any help is appreciated.

Hello! I’m guessing the GPU’s are located physically as follows: Quadro PCI-E port 1, GTX 680 PCI-E port 2(or 3 depending on the size of GPU1)

After fast googlefu - The guide you are following seemed really comprehensive.
Anyway :grin:

Here’s a good guide for GPU passthrough
Archwiki - PCI_passthrough_via_OVMF I was able to get GPU passthrough working on my setup following the guide :slight_smile:

What are your kernel parameters? (in /etc/default/grub) Have you remembered to run update-grub after modifying the grub file?
Could you post what’s inside

/etc/modules
/etc/initramfs/modules

Could you post the output of the dmesg command and IOMMU-group script found at: Archwiki - Ensuring_that_the_groups_are_valid
Save the script as iommu.sh (anything really - just make it executable with chmod +x)

I want to make sure that you have IOMMU enabled and sane IOMMU groups. Every device should (not all mobo’s are the same) have a different IOMMU group.

1 Like

Thanks for the reply! Your assumption about the GPUs are correct. the k2000 is in slot 1 and the gtx 680 is in slot 4. I set up the bios to use the GPU in slot 1 as primary video output if that makes a difference. I know for a fact that the slot the GTX 680 is in is controlled by the CPU, the k2000 im not sure (Dell’s specsheet doesn’t specify for that slot only but i don’t think that should matter because im not passing that through)

Here are my kernel parameters, and I have ran update-grub since adding them:

GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash intel_iommu=on”

Here of the output of that script. The GTX 680’s vga and audio devices are in their own IOMMU group and are not sharing with any other devices on the PCIe bus: (was going to post a pastebin link but i cant post links, just add 1gEZnTLJ at the end of the pastebin url, don’t wanna clutter up this thread)

Also, the output of dmesg | grep IOMMU is:

[ 0.071387] DMAR: IOMMU enabled

[ 0.158740] DMAR-IR: IOAPIC id 12 under DRHD base 0xfbffc000 IOMMU 2

[ 0.158741] DMAR-IR: IOAPIC id 11 under DRHD base 0xd8ffc000 IOMMU 1

[ 0.158741] DMAR-IR: IOAPIC id 10 under DRHD base 0xb5ffc000 IOMMU 0

[ 0.158742] DMAR-IR: IOAPIC id 8 under DRHD base 0x92ffc000 IOMMU 3

[ 0.158742] DMAR-IR: IOAPIC id 9 under DRHD base 0x92ffc000 IOMMU 3

One last thing: Is there a possibility that Mint 20 would have something to do with it? It has been much buggier than Mint 19.3 for me at least and the guide I was originally following was for Ubuntu 18.04 based distros, and Mint 20 is based on Ubuntu 20.04. I may try going to Mint 19 anyways, if only to increase system stability elsewhere, if I can’t figure out whats wrong with my current setup.

Hey no probs - We’re here to help! :slightly_smiling_face:

You should also add iommu=pt to your kernel parameters as described in archwiki: This will prevent Linux from touching devices which cannot be passed through.

Your IOMMU groups seem a-ok.

Did you run dmesg | grep -i -e DMAR -e IOMMU? The output should also have something like this: “IOMMU: Setting identity map for device”

You forgot to post the /etc/modules and /etc/initramfs/modules output but I’m guessing they contain the following: vfio_pci vfio vfio_iommu_type1 vfio_virqfd vfio-pci.ids=10de:1180,10de:0e0a

I really can’t say if Mint 20 has anything to do with it but if I were to guess I’d say probably no. From what I’ve heard Ubuntu 20.04 should be a good release.
I’d advise against going “down” on a release. Always use the newest release for security purposes :slight_smile: Is there something that really bugs you - You mentioned system instability?
You can run

systemctl status

To check if systemd services are running ok. (I’ve had to troubleshoot mine once or twice - some services were degraded)

journalctl -b

To check kernel messages. (Red messages are errors)

Is there a reason you’re using Mint 20? (Cinnamon or MATE -DE possibly?) I’ve had really good luck with kubuntu for a couple of years now. (Then again, both kubuntu and mint are based on ubuntu/debian so I’d guess mint is as robust as kubuntu has been for me.)

Good luck!

1 Like

I tried adding iommu=pt to my grub arguments, no change :frowning:

Here is the output of dmesg | grep -i -e DMAR -e IOMMU: (another pastebin link: KW0RbPrf). I don’t see anything like “IOMMU: Setting identity map for device” unfortunately (unless i missed it i’m half awake right now).

Yea my bad. I completely forgot to post whats in /etc/modules, but that’s exactly whats in there: vfio_pci vfio vfio_iommu_type1 vfio_virqfd vfio-pci.ids=10de:1180,10de:0e0a, however I don’t have an “/etc/initramfs/” dir, but I do have a /etc/initramfs-tools/modules files which has the same contents as /etc/modules.

I probably shouldn’t have used the word “instability” when describing my issues with Mint 20. Its really just a few quirks with the cinnamon desktop, in fact I have no idea if its the fault of cinnamon or the distro. Sometimes windows appear in the window list that I have closed out a while ago and are definitely not running, the usual audio quirks, some graphical artifacting when resizing certain windows (this may be an issue with the nvidia driver), that kind of stuff… These are pretty small issues and they don’t happen consistently, but they happen enough to make them annoying sometimes, and I have never had issues like these on 19.3 on other machines. And also, isn’t 19.3 an LTS release? I though that means I still get updates through the package manager for about three more years… but maybe I’m mistaken. If that’s the case, then my then there would be mint 20.whatever or maybe even 21, which probably would have fixed the issues me and others are having, so I can just upgrade then. But anyways, I’m not having apps crash or things like that, but thanks for the trobbleshooting tips!

The main reason I’m using mint is partially because of cinnamon, but also because 19 has been the most “plug-and-play” distro I have ever personally used, requiring minimum tweaking to get things running the way I want. I came from kubuntu, which was pretty good, but I ended up liking cinnamon more for its greater simplicity without sacrificing much customization. And its based off of debian/ubuntu, which I have found to be the most reliable for desktop use and I’m more familiar with it.

Hmm… I looked at the pastebin and it seemed ok to me. I was just referencing archwiki on the “IOMMU: Setting identity map for device” comment - I’m just a archwiki bozo trying to help :grin:

I looked at the error Request device [00:17.0] PASID ffffffff fault addr 40ce1000 [fault reason 06] PTE Read access is not set and the [00:17.0] -ID points to a raid bus controller in your IOMMU group. So it should not be the problem.

Quick googling of the

[0.546152] Initramfs unpacking failed: Decoding failed

error lead me to askubuntu which suggested the following:

Undo your previous attempts, and do sudo update-initramfs -c -k $(uname -r)

I’m running a bit out of my depth here - You seem to be taking all of the right steps… So it should work.
You could try using just kernel parameters. Clear /etc/modules and /etc/initramfs-tools/modules of vfio-pci stuff, run sudo update-initramfs -c -k $(uname -r). Then add vfio_pci vfio vfio_iommu_type1 vfio_virqfd vfio-pci.ids=10de:1180,10de:0e0a to your grub (in /etc/default/grub) as kernel parameters and do update-grub. Reboot and see if it fixes it.
If that fails - Try adding vfio-pci.ids=10de:1180,10de:0e0a as kernel parameters and vfio_pci vfio vfio_iommu_type1 vfio_virqfd to /etc/modules and /etc/initramfs-tools/modules. Regenerate initramfs and grub. (So kernel parameters has intel_iommu=on iommu=pt vfio-pci.ids=10de:1180,10de:0e0a and the modules files have the rest vfio stuff)

Every distro has it’s quirks… So far I’ve used ~5 and every one of them has had something small… (It’s the opensource blessing/curse - Everyone wants to do things their way)
You are right Mint 19.3 release is LTS. I might have misspoke - Security/critical upgrades are updated to every supported LTS release. It’s just that newer releases have newer programs (DE for example - My two computers both use KDE - but kubuntu lags a bit behind on versions compared to opensuse tumbleweed)
You can always install Mint 19 if it works better :slight_smile: or jump in the opensource ship and contribute by issuing bug reports :stuck_out_tongue:

Sometimes windows appear in the window list that I have closed out a while ago and are definitely not running,

This for example seems like a bug.

In my experience what users look in distros different widely - I do not mind tweaking so the “plug-and-play” experience is not really my thing :smiley: You are right that debian/ubuntu seems to be the most reliable desktop! (I’ve never tried fedora but from what I’ve heard that is actually the real deal in terms of rock-solid stability) Then again I’ve never had any linux I’ve used tilt like windows has (the oh so famous BSOD)

Good luck!

1 Like

Yep, removing my vfio stuff from both modules config files and putting them in my kernel parameters has fixed the issue! I no longer see the 680 in nivida xserver settings and i am able to use virt-manager to pass it through to a vm! Thank you soo much!!!

However, I do have to wonder why do I have to put them in my kernel parameters in order for this to work? Since it seems to be working now, I don’t really need the answer to this, but I’m still curious why. Also, I figured that all those messages go away after the first reboot after running update-grub or update-initramfs, so I’m not really worried about those anymore. But anyway, its working!

As for what I’m going to do going forward, I think I’m going to go back to mint 19 for now on this pc, but I do have some other machines that I use frequently as dedicated test machines for trying out different OSs and software that I’m going to run Mint 20 on for a while and try to isolate some of the issues I’m having and see if they go away with time/updates, and report the ones I think aren’t limited to just me.

But anyways, thank you for all the suggestions and support!

Good to hear that it works!

As to why kernel parameters work - I really don’t know :smiley: If I were to guess - it’s probably because vfio has been in the kernel as a module for some time now so calling it as a kernel parameter is possible…

From my understanding the kernel modules and parameters are pretty close to each other anyway. When you change kernel modules (adding stuff into them and regenerating initramfs) they (the modules) get baked into the kernel for next boot and do stuff accordingly. (bind vfio-pci for example)

Kernel parameters act the same way. At boot the kernel checks the parameters and does stuff according to them. (Load and bind vfio-pci, enable IOMMU and such)

I think the modules stuff could be outdated (in the case of vfio) because vfio already exists as a module inside the kernel, it just needs to be loaded (as a kernel parameter in this case)

Somebody correct me if I’m wrong - that’s what the internet is for haha. Anyway I’m rambling on. It works!! Fingers crossed the pesky code 43 won’t be showing up.

Quick update: I have gone back to Mint 19 and updated to kernel 5.4.0 and I went immediately to the above method and didn’t even bother checking if /etc/modules would work now. However, I am getting the code 43 error in the vm when trying to install drivers for the 680 even after editing the xml configuration to set the vendor id and set kvm to hidden. Unfortunately, it looks like my 680 has a BIOs taht is not fully compatible with UEFI (which explains why its video output does not show the dell logo and bios screen when booting up), and it turns out that you can’t use non-uefi cards with the OVMF UEFI bios in qemu/kvm, or at least this one. I could try seabios wich apparently does work, or I could just wait for my 1660 super that I ordered recently to arrive which shouldn’t have this issue.

However, the above solution to PCIe passthrough is working with no issues other than that, thanks again!

1 Like