Ryzen IOMMU: PCIe Passthrough works, BUT... | Level One Techs

Do you think I could get PCIe passthrough working with two Nvidia cards on an ASRock Fatal1ty X370 Gaming K4? I wanted to get a Ryzen 5 CPU with that board, get a reasonably cheap card like a 1050 for Linux and use my 980 Ti for a Windows VM but I'm not sure if I should wait until I have some kind of confirmation that it will work.

Hello everybody!
I have been also dreaming about making it possible to run 2 VMs on one machine.
I can now finally say, I was successful with Ryzen 7 + unRAID!
I am now able to run 2 or even 3 VMs with dedicated GPUs.

My setup:
Ryzen 7 1700 @stock (planning OC)
ASUS ROG Crosshair Hero VI
16 GB Corsair 3200 MHz DDR4
Samsung 960 EVO 250GB (cache)
Seagate Barracuda 3TB (planning to get one or two more)
Radeon HD5770, R5 230 (guests); Nvidia GTS 450 (host, guest also possible)
- I am using these lowend GPUs which I had lying around, once all tested, I'm going to get pair of RX480/580.
Linux 4.9.19-unRAID x86_64
ACS Override Enabled (downstream,multifunction)
- note that multifunction option was MANDATORY for my setup, otherwise the override was working only for top 2 PCI-Ex16 slots.

Syslinux config:
cat /boot/syslinux/syslinux.cfg
default /syslinux/menu.c32
menu title Lime Technology, Inc.
prompt 0
timeout 50
label unRAID OS
menu default
kernel /bzimage
append pcie_acs_override=downstream,multifunction vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot
label unRAID OS GUI Mode
kernel /bzimage
append pcie_acs_override=downstream initrd=/bzroot,/bzroot-gui
label unRAID OS Safe Mode (no plugins, no GUI)
kernel /bzimage
append pcie_acs_override=downstream initrd=/bzroot unraidsafemode
label unRAID OS GUI Safe Mode (no plugins)
kernel /bzimage
append initrd=/bzroot,/bzroot-gui unraidsafemode
label Memtest86+
kernel /memtest

IOMMU Groups - these actually played out amazingly! I did not expect after lots of hate on the grouping, that it would separate so perfectly, see for yourself!

IOMMU group 0
	[1022:1452] 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
IOMMU group 1
	[1022:1453] 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
IOMMU group 2
	[1022:1453] 00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
IOMMU group 3
	[1022:1452] 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
IOMMU group 4
	[1022:1452] 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
IOMMU group 5
	[1022:1453] 00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
IOMMU group 6
	[1022:1453] 00:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
IOMMU group 7
	[1022:1452] 00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
IOMMU group 8
	[1022:1452] 00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
IOMMU group 9
	[1022:1454] 00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1454
IOMMU group 10
	[1022:1452] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
IOMMU group 11
	[1022:1454] 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1454
IOMMU group 12
	[1022:790b] 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 59)
	[1022:790e] 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
IOMMU group 13
	[1022:1460] 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1460
	[1022:1461] 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1461
	[1022:1462] 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1462
	[1022:1463] 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1463
	[1022:1464] 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1464
	[1022:1465] 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1465
	[1022:1466] 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1466
	[1022:1467] 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1467
IOMMU group 14
	[144d:a804] 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961
IOMMU group 15
	[1022:43b9] 03:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 43b9 (rev 02)
IOMMU group 16
	[1022:43b5] 03:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43b5 (rev 02)
IOMMU group 17
	[1022:43b0] 03:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b0 (rev 02)
IOMMU group 18
	[1022:43b4] 1d:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)
IOMMU group 19
	[1022:43b4] 1d:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)
IOMMU group 20
	[1022:43b4] 1d:03.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)
IOMMU group 21
	[1022:43b4] 1d:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 43b4 (rev 02)
IOMMU group 22
	[1b21:1343] 21:00.0 USB controller: ASMedia Technology Inc. Device 1343
IOMMU group 23
	[8086:1539] 23:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
IOMMU group 24
	[10de:0dc4] 25:00.0 VGA compatible controller: NVIDIA Corporation GF106 [GeForce GTS 450] (rev a1)
IOMMU group 25
	[10de:0be9] 25:00.1 Audio device: NVIDIA Corporation GF106 High Definition Audio Controller (rev a1)
IOMMU group 26
	[1002:677b] 26:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos PRO [Radeon HD 7450]
IOMMU group 27
	[1002:aa98] 26:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI Audio [Radeon HD 6450 / 7450/8450/8490 OEM / R5 230/235/235X OEM]
IOMMU group 28
	[1002:68b8] 27:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Juniper XT [Radeon HD 5770]
IOMMU group 29
	[1002:aa58] 27:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Juniper HDMI Audio [Radeon HD 5700 Series]
IOMMU group 30
	[1022:145a] 28:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 145a
IOMMU group 31
	[1022:1456] 28:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 1456
IOMMU group 32
	[1022:145c] 28:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 145c
IOMMU group 33
	[1022:1455] 29:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 1455
IOMMU group 34
	[1022:7901] 29:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
IOMMU group 35
	[1022:1457] 29:00.3 Audio device: Advanced Micro Devices, Inc. [AMD] Device 1457

VMs:
8 CPUs (0-7 and 8-15)
Machine i440fx-2.7
BIOS: OVMF
Hyper-V is OFF
Virtio drivers: virtio-win-0.1.126-2.iso
keyboard and mouse passed through
Audio from USB double jack adapter
- I had to set sound quality to at least 48kHz, or there was crackling in the audio
or
Audio passed through from MB ([1022:1457] 29:00.3)

With this setup, I am running 2 GPUs with the Radeon cards passed through to VMs. I was, unfortunately, unable to passthrough the Nvidia card due to constant BSOD (System Thread Exception).
However, with newer Nvidia card, or another AMD GPU it is possible to run even 3 separate machines from this setup!
- unRAID does not mind at all that I am actually taking a GPU from it and it assigns it to the VM correctly - it is even able to boot to the VM, the BSOD appears after installing Nvidia drivers.

Hope I'll help at least somebody trying to achieve the same I was able to.

2 Likes

Hmm, I had similar luck using older radeon hd + RX480 but mixing fury + rx480 or rx480 + rx460 was much more unstable.
It would be interesting to test.

What are the simptoms of instability?
I have 2 r9 290 in main slots and passing one of them to VM.
I had no crashes but some games suffer performance penalty.
I had passed both of them but that had not resolved the problem.

heaven seems to just hang, but can be restarted. sometimes gta v just crashes back to the desktop. its more annoying than anything.

I'm looking for a new build with Ryzen that will focus on virtualization including GPU passthrough. I was wondering if anyone could help me out with a few questions. I am currently looking at using either the Asus PRIME B350-PLUS or the Gigabyte GA-AB350-GAMING 3. Does anyone have any information on how the IOMMU groups look on either motherboard? Is the ACS override patch needed for either one? Does anyone happen to have any information on how stable they are with the GPU passed through?

Second, does anyone have any information on whether it is possible to do GPU passthrough while using a NVM-e SSD? Is the NVM-e slot in a different IOMMU group?

Thank you very much in advance for the information!

2 Likes

Hi all! I went the road of building a Ryzen system using ASRock x370 Taichi and 1800x with the hopes of having a two GPU setup where one of the GPUs would be passed through to a Windows 10 VM. Unfortunately it seems that both PCIe GPUs show up in group 2 as @wendell had mentioned.

I submitted a ticket to ASRock late last Friday to see if I could get a response from them, but haven't heard anything just yet. If/when I hear anything I will report back!

MB: ASRock Taichi x370
PS: Ryzen 1800x
OS: Ubuntu 17.04
Kernel: 4.10.0-19-generic

You can use the ACS patch if the cards are different enough?

Wendell,

How exactly do you use the ACS patch? Do we need to install linux to install the ACS patch?
Sorry for the basic questions. Yes I am a noob and proud of it :).

So the ACS patch seems to be working as it split my GPUs into separate groups, but I am running into a code 43 on my vm which has proved quite frustrating.

My two GPUs (which are a bit too similar I believe) are:

EVGA gtx 1070 (primary)
EVGA gtx 750 ti (passthrough)

While testing I blacklisted both nvidia and nouveau, and I am using vfio-pci. All drivers when I initially start look great, but that code 43 is blocking me from the 'magic'. When I initially start the VM as well I see vfio-intx interrupts for the gtx 750 ti card.

UPDATE:
I had a Nvidia Quadro 2000 card lying around and switched out the 750 ti and the pcie passthrough worked like a charm. So it seems with the ACS patch, all is well; unfortunately, I'll have to figure out how to get past the code 43 error with the 750 ti.

code 43 is well known. it is nvidia drivers detecting they are in a hyper visor. you have to setup your vm to hide the fact that it is a vm. This is why I am reluctant to recommend nvidia for virtualization

Pcper has an article up right now mentioning an update that improves support with virtual machines and multiple GPUs, fingers crossed this means full iommu support goon going forward? Or is this an inbetween measure that solves a different issue.

"Beyond the memory improvements AMD is also adding support for PCI Express Access Control Services which will improve virtualization support and allow users with multiple graphics cards to dedicate a card to the host and another card to the virtual machine."

https://www.pcper.com/news/General-Tech/AMD-AGESA-Update-1006-Will-Support-Configurable-Memory-Sub-Timings-And-Clockspeeds

Beta BIOS:es for Gigabyte boards with IOMMU and other improvments released.

I'm on GA-AB350 Gaming and updated the bios (F5c beta version). The arch linux installation I have still doesn't boot properly with IOMMU enabled in the bios.

Anyone else with the same problem?

Welp. My problem was all the other settings. After enabling all possible UEFI settings, I can boot properly with IOMMU enabled on Gigabyte GA-AB350 Gaming using Arch Linux kernel 4.11 something.

Has anyone tried this in a multiseat configuration, i.e. with 2 VMs, each having their own GPU on the same system?

May seem like on odd question, but attempting this on any of three X58 chipset boards that I have available to me results in both VMs GPU driver crashing.

(This is the case if and only if the VMs have attached disks which live on a physical disk other than the root disk! I’ve tried everything I can think to try, and asked on the vfio mailing list, but no luck.)

Sorry to bring up a dead thread, but starting a new topic has not resulted in any answers yet:

How is the grouping on Gigabyte GA-AX370 now days?
I can’t seem to find any good answers while searching.

I know this is months later but I am getting the same kernel panic on asrock ab350m pro4. From my testing it has something to do with GCN cards specifically (rx580). it only happens with a gcn card in a specific slot (x4 slot) and only with iommu enabled.

1 Like

Yeah, I’m guessing they haven’t looked at fixing this stuff on the B350 at all. I need to look through the forums and find an X370 board that works well with IOMMU and just upgrade the board. I can reuse this b350 for my file server and a lower end ryzen.

1 Like

I wound up finding a solution and have a working VM now. Iommu=pt as a kernel param. I still have some weird instability with iommu on that causes what appears to be a display driver Crash in the host. But its very rare and hard to pin down the cause.