Asus ROG Zenith Extreme X399 Threadripper IOMMU Details

As soon as the new AGESA for Threadripper lands, let us know the updated groupings. When the chipset groupings get fixed, that’s when I’ll pay serious attention to Threadripper.

My bios is dated in late September, and my groups look better than mike’s:

# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/17/devices/0000:0b:00.2
/sys/kernel/iommu_groups/17/devices/0000:0b:00.0
/sys/kernel/iommu_groups/17/devices/0000:0b:00.3
/sys/kernel/iommu_groups/35/devices/0000:45:00.3
/sys/kernel/iommu_groups/35/devices/0000:45:00.2
/sys/kernel/iommu_groups/35/devices/0000:45:00.0
/sys/kernel/iommu_groups/7/devices/0000:00:07.0
/sys/kernel/iommu_groups/25/devices/0000:40:03.1
/sys/kernel/iommu_groups/15/devices/0000:09:00.0
**> /sys/kernel/iommu_groups/33/devices/0000:43:00.0**
**> /sys/kernel/iommu_groups/33/devices/0000:43:00.1**
/sys/kernel/iommu_groups/5/devices/0000:00:03.1
/sys/kernel/iommu_groups/23/devices/0000:40:02.0
/sys/kernel/iommu_groups/13/devices/0000:00:19.0
/sys/kernel/iommu_groups/13/devices/0000:00:19.7
/sys/kernel/iommu_groups/13/devices/0000:00:19.5
/sys/kernel/iommu_groups/13/devices/0000:00:19.3
/sys/kernel/iommu_groups/13/devices/0000:00:19.1
/sys/kernel/iommu_groups/13/devices/0000:00:19.6
/sys/kernel/iommu_groups/13/devices/0000:00:19.4
/sys/kernel/iommu_groups/13/devices/0000:00:19.2
/sys/kernel/iommu_groups/31/devices/0000:41:00.0
/sys/kernel/iommu_groups/3/devices/0000:00:02.0
/sys/kernel/iommu_groups/21/devices/0000:40:01.2
/sys/kernel/iommu_groups/11/devices/0000:00:14.0
/sys/kernel/iommu_groups/11/devices/0000:00:14.3
/sys/kernel/iommu_groups/1/devices/0000:00:01.1
/sys/kernel/iommu_groups/28/devices/0000:40:07.1
/sys/kernel/iommu_groups/18/devices/0000:0c:00.3
/sys/kernel/iommu_groups/18/devices/0000:0c:00.2
/sys/kernel/iommu_groups/18/devices/0000:0c:00.0
/sys/kernel/iommu_groups/36/devices/0000:46:00.2
/sys/kernel/iommu_groups/36/devices/0000:46:00.0
/sys/kernel/iommu_groups/8/devices/0000:00:07.1
/sys/kernel/iommu_groups/26/devices/0000:40:04.0
**> /sys/kernel/iommu_groups/16/devices/0000:0a:00.0**
**> /sys/kernel/iommu_groups/16/devices/0000:0a:00.1**
**> /sys/kernel/iommu_groups/34/devices/0000:44:00.1**
**> /sys/kernel/iommu_groups/34/devices/0000:44:00.0**
/sys/kernel/iommu_groups/6/devices/0000:00:04.0
/sys/kernel/iommu_groups/24/devices/0000:40:03.0
/sys/kernel/iommu_groups/14/devices/0000:02:09.0
/sys/kernel/iommu_groups/14/devices/0000:02:02.0
/sys/kernel/iommu_groups/14/devices/0000:05:00.0
/sys/kernel/iommu_groups/14/devices/0000:01:00.1
/sys/kernel/iommu_groups/14/devices/0000:02:01.0
/sys/kernel/iommu_groups/14/devices/0000:04:00.0
/sys/kernel/iommu_groups/14/devices/0000:02:04.0
/sys/kernel/iommu_groups/14/devices/0000:03:00.0
/sys/kernel/iommu_groups/14/devices/0000:02:00.0
/sys/kernel/iommu_groups/14/devices/0000:01:00.2
/sys/kernel/iommu_groups/14/devices/0000:02:03.0
/sys/kernel/iommu_groups/14/devices/0000:08:00.0
/sys/kernel/iommu_groups/14/devices/0000:01:00.0
/sys/kernel/iommu_groups/32/devices/0000:42:00.0
/sys/kernel/iommu_groups/4/devices/0000:00:03.0
/sys/kernel/iommu_groups/22/devices/0000:40:01.3
/sys/kernel/iommu_groups/12/devices/0000:00:18.6
/sys/kernel/iommu_groups/12/devices/0000:00:18.4
/sys/kernel/iommu_groups/12/devices/0000:00:18.2
/sys/kernel/iommu_groups/12/devices/0000:00:18.0
/sys/kernel/iommu_groups/12/devices/0000:00:18.7
/sys/kernel/iommu_groups/12/devices/0000:00:18.5
/sys/kernel/iommu_groups/12/devices/0000:00:18.3
/sys/kernel/iommu_groups/12/devices/0000:00:18.1
/sys/kernel/iommu_groups/30/devices/0000:40:08.1
/sys/kernel/iommu_groups/2/devices/0000:00:01.3
/sys/kernel/iommu_groups/20/devices/0000:40:01.1
/sys/kernel/iommu_groups/10/devices/0000:00:08.1
/sys/kernel/iommu_groups/29/devices/0000:40:08.0
/sys/kernel/iommu_groups/0/devices/0000:00:01.0
/sys/kernel/iommu_groups/19/devices/0000:40:01.0
/sys/kernel/iommu_groups/9/devices/0000:00:08.0
/sys/kernel/iommu_groups/27/devices/0000:40:07.0

GPUs are 0a:00, 43:00, and 44:00, and each has it’s own group.

I do notice that many of my devices tend to be ‘behind’ my Host Controller:

# lspci
0c:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
0c:00.3 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller
   40:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex**
   40:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit**
   40:01.0** Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
   40:01.1** PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
   40:01.2** PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
   40:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453

<snip>

   41:00.0** Non-Volatile memory controller: Device 1987:5007 (rev 01)
   42:00.0** Non-Volatile memory controller: Device 1987:5007 (rev 01)
   43:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)**
   43:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller (rev a1)**
   44:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)**
   44:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)**
   45:00.0** Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 145a
   45:00.2** Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor
...

I didn’t realize how broken this is compared to my 8 year old LGA1366 boards. I’m getting D3 like almost everyone else, and the accounts I’m finding where it work either require ESXi, or Vega cards, neither of which I’m willing to use for reasons of practicality.

The reddit thread where AMD was interacting with users has gone cold:

And what’s worse, the AMD rep disappeared with it:

https://www.reddit.com/user/AMD_Robert

I had AMD chips from the late 90s until the late 2000s. I had a fondness for them, and I switched to Intel reluctantly. Now I have the opposite sort of feeling. I’m not sure if I can convince my vendor to take this stuff back, but I guess I can liquidate it on eBay.

Bad news: I found UEFI 804 Beta, released only a few days ago, but it still contains microcode 0x8001129. Presumably this is part of AGESA 1.0.0.3.

In the thread another user mentioned that other boards are already running 1.0.0.4 in production. What concerns me about that is not only that our beta is behind other board’s production version, but also that I didn’t see any reports in recent days about IOMMU/Passthrough being fixed on boards that got the new microcode, like one can find with Ryzen.

If this is the case, then who’s to say it won’t be until 1.0.0.6 or .7 that this is addressed (if at all, no one guarantees us anything), and if current trends continue Asus boards could still be a point-release behind.

It’s unfortunate because on paper the platform would be great for this. I wonder if we aren’t seeing the result of AMD being a budget chip maker for the last decade plus. Intel likely has orders of magnitude more experience with this, and AMD hasn’t been much to speak of in the server space since Opteron, which predates virtualization instruction sets in processors. That a virtualization related bug with a significant effect on performance was just discovered after almost ten years seems to speak to this.

Anyway, I don’t wanna come off as a downer: it is what it is, money is for spending, and the journey is half the point. That said, after never being able to get multisaet working with X58 (single seat worked fine, but the platform was young and has issues with interrupts), I’m about ready for something that works out of the box now.

1 Like

Aye, it’s due to this level of frustration that I returned 1x 1950X back to Amazon (waiting on refund) and ordered a i9-7920X which is running Fedora 27/v4.13 at the moment. I got XenServer to boot (with @FurryJackman helping out!) but haven’t had time fully toying with that.

Will post my IOMMU findings from that build in the other thread asap…

Looks like a fix is finally at hand.

I already had an RMA# for mine, plus 20% restocking, and have ordered an Asus X299 MB (a used Rampage Extreme), which I’ll be sending back, it seems. Was just about to make an eBay listing.

1 Like

Nice, still got the second Zenith Extreme in the box - once the fix lands, I’ll order another 1950X.

I can confirm this gets me past the D3 error, but I’m still hanging at the UEFI splash. I left a comment in the reddit.

You mentioned (on reddit) having a GTX1070 on the host, how did you get the NVIDIA drivers to work with the 4.15-rc1 kernel?

Thanks!

This might be a bit outdated a thread to bring back up, But seeing as how this was a prominent thread that popped up when i googled this board and IOMMU configuration for an unRAID setup im working on. I thought i’d let any other passers by know the latest bios release for this motherboard (x399 Zenith Extreme) BIOS version.1402 has improved the IOMMU grouping significantly. A couple USB controllers have separate IOMMU groups now. Along with the built-in sound card and several other things now having their own IOMMU groups as well. Making it much easier for those of us experimenting with these Server+VM/s /w Native performance and functionality. And with the new threadrippers coming out. All of this is highly relevant again because this board is still one of the best and most appropriate boards on the market for the new chips.

2 Likes

Excellent. Thank you for sharing!