Asus ROG Zenith Extreme X399 Threadripper IOMMU Details

Ah, that’s enough for me. I was also looking at the Asrock board. Any experience with it? I think Wendell made a video, I’ll have to check the channel.

I’m not on 10G, but at some point, will probably get there. I have really slow peripherals, still running lots of spinning drives and consumer SATA SSD’s, so it’s not really worth it for me considering the amount of performance I’d get out of it.

I’m also moving in the next year or so, so it’s not worth running cat7 in the house.

FYI Groupings are nasty on the Gigabyte board only - at least from what I’ve seen so far.

Yeah, I can’t exactly recall but I think he was happy with the Asrock and MSI boards but double check his reviews first. I don’t recall him covering the Zenith Extreme though - @wendell does far more thorough NVMe testing (as he has so many FREE toys to play with!!).

I’m hoping to order a Vega 64 by mid December, only then will I be able to play with GPU pass through.

For my 10G link, I plan to go SFP (where possible) as the non-copper 10G switches are cheaper especially this Ubiquiti Networks ES-16-XG EdgeSwitch 16 10G 16-Port Managed Aggregation Switch.

I want to go SFP to at least ensure my network link will no longer be an IOPS bottle neck for iSCSI, as I’m running a fair few simultaneous shares between the FreeNAS box, and most likely at least 1-block-level share for each VM that will run on the final XenServer.

Wow… I had a hard time sourcing anything decent beyond Cat5E, but then again my primary use-case was for outdoor runs for PoE cameras (from Ubiquiti); so I ended up ordering 1000ft roll of Ubiquiti Networks 1000’ TOUGHCableCARRIER Outdoor Shielded Cat 5e Ethernet Cable (Level 2) from B&H (since their DHL rates are the lowest to Colombo). Since I have a fair bit of excess, I’ve been running these indoors as well.

Cat5E is about as good as you’ll get for outdoor, from what I can tell.

Technically, for 10G, you only need Cat6A, but I figure “go big or go home” is an appropriate model for this solution.

1 Like

Haha that’s been my recent model too :wink: In anycase, these are expansions that I expect to last a minimum of 5-years, so even the ‘over the top’ hardware on the TR boxes, is in line with that plan…

1 Like

Yep, and when all else fails, X399 is supposed to support the next couple generations of TR CPUs, if I remember correctly.

1 Like

Another reason for ditching the Gigabyte board, the fact that it doesn’t have an Intel NIC (on board). Tried installing XenServer but the installer bombs out as it doesn’t have drivers for the ‘Killer NIC’.

On the plus side, the Zenith Extreme has a good 'ol Intel Gigabit NIC, so I’m hoping the install should work with that.

Mmm, believe so…

I had more IOMMU group on the Zenith. Unfortunately the system stopped posting. before I could test it. I’m not sure, but it could have been the “enumerate IOMMU for IVR” (or so setting).

Maybe Asus support can think of something I haven’t, Nd get it to POST. If so I should be able to test my set up tomorrow.

Can you reset to defaults in your UEFI and try posting again? Which UEFI do you have on the Zenith?

The problem is that I can’t enable PCIE_ARI and have my PNY M.2 card installed under the southbridge. It will fail to post with a 00, and it isn’t even possible to reset the bios. I left the battery out overnight, and it still would not boot. Maybe I needed to leave it longer.

Asus says it will work with a stick from ‘the list’. I have my doubts, but we’ll see.

I put in place what worked with my old X58 chipset, but having referenced your post and a couple of others previously, I know there’s a few more settings I gotta pin down. I’ll update the thread with my results.

As soon as the new AGESA for Threadripper lands, let us know the updated groupings. When the chipset groupings get fixed, that’s when I’ll pay serious attention to Threadripper.

My bios is dated in late September, and my groups look better than mike’s:

# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/17/devices/0000:0b:00.2
/sys/kernel/iommu_groups/17/devices/0000:0b:00.0
/sys/kernel/iommu_groups/17/devices/0000:0b:00.3
/sys/kernel/iommu_groups/35/devices/0000:45:00.3
/sys/kernel/iommu_groups/35/devices/0000:45:00.2
/sys/kernel/iommu_groups/35/devices/0000:45:00.0
/sys/kernel/iommu_groups/7/devices/0000:00:07.0
/sys/kernel/iommu_groups/25/devices/0000:40:03.1
/sys/kernel/iommu_groups/15/devices/0000:09:00.0
**> /sys/kernel/iommu_groups/33/devices/0000:43:00.0**
**> /sys/kernel/iommu_groups/33/devices/0000:43:00.1**
/sys/kernel/iommu_groups/5/devices/0000:00:03.1
/sys/kernel/iommu_groups/23/devices/0000:40:02.0
/sys/kernel/iommu_groups/13/devices/0000:00:19.0
/sys/kernel/iommu_groups/13/devices/0000:00:19.7
/sys/kernel/iommu_groups/13/devices/0000:00:19.5
/sys/kernel/iommu_groups/13/devices/0000:00:19.3
/sys/kernel/iommu_groups/13/devices/0000:00:19.1
/sys/kernel/iommu_groups/13/devices/0000:00:19.6
/sys/kernel/iommu_groups/13/devices/0000:00:19.4
/sys/kernel/iommu_groups/13/devices/0000:00:19.2
/sys/kernel/iommu_groups/31/devices/0000:41:00.0
/sys/kernel/iommu_groups/3/devices/0000:00:02.0
/sys/kernel/iommu_groups/21/devices/0000:40:01.2
/sys/kernel/iommu_groups/11/devices/0000:00:14.0
/sys/kernel/iommu_groups/11/devices/0000:00:14.3
/sys/kernel/iommu_groups/1/devices/0000:00:01.1
/sys/kernel/iommu_groups/28/devices/0000:40:07.1
/sys/kernel/iommu_groups/18/devices/0000:0c:00.3
/sys/kernel/iommu_groups/18/devices/0000:0c:00.2
/sys/kernel/iommu_groups/18/devices/0000:0c:00.0
/sys/kernel/iommu_groups/36/devices/0000:46:00.2
/sys/kernel/iommu_groups/36/devices/0000:46:00.0
/sys/kernel/iommu_groups/8/devices/0000:00:07.1
/sys/kernel/iommu_groups/26/devices/0000:40:04.0
**> /sys/kernel/iommu_groups/16/devices/0000:0a:00.0**
**> /sys/kernel/iommu_groups/16/devices/0000:0a:00.1**
**> /sys/kernel/iommu_groups/34/devices/0000:44:00.1**
**> /sys/kernel/iommu_groups/34/devices/0000:44:00.0**
/sys/kernel/iommu_groups/6/devices/0000:00:04.0
/sys/kernel/iommu_groups/24/devices/0000:40:03.0
/sys/kernel/iommu_groups/14/devices/0000:02:09.0
/sys/kernel/iommu_groups/14/devices/0000:02:02.0
/sys/kernel/iommu_groups/14/devices/0000:05:00.0
/sys/kernel/iommu_groups/14/devices/0000:01:00.1
/sys/kernel/iommu_groups/14/devices/0000:02:01.0
/sys/kernel/iommu_groups/14/devices/0000:04:00.0
/sys/kernel/iommu_groups/14/devices/0000:02:04.0
/sys/kernel/iommu_groups/14/devices/0000:03:00.0
/sys/kernel/iommu_groups/14/devices/0000:02:00.0
/sys/kernel/iommu_groups/14/devices/0000:01:00.2
/sys/kernel/iommu_groups/14/devices/0000:02:03.0
/sys/kernel/iommu_groups/14/devices/0000:08:00.0
/sys/kernel/iommu_groups/14/devices/0000:01:00.0
/sys/kernel/iommu_groups/32/devices/0000:42:00.0
/sys/kernel/iommu_groups/4/devices/0000:00:03.0
/sys/kernel/iommu_groups/22/devices/0000:40:01.3
/sys/kernel/iommu_groups/12/devices/0000:00:18.6
/sys/kernel/iommu_groups/12/devices/0000:00:18.4
/sys/kernel/iommu_groups/12/devices/0000:00:18.2
/sys/kernel/iommu_groups/12/devices/0000:00:18.0
/sys/kernel/iommu_groups/12/devices/0000:00:18.7
/sys/kernel/iommu_groups/12/devices/0000:00:18.5
/sys/kernel/iommu_groups/12/devices/0000:00:18.3
/sys/kernel/iommu_groups/12/devices/0000:00:18.1
/sys/kernel/iommu_groups/30/devices/0000:40:08.1
/sys/kernel/iommu_groups/2/devices/0000:00:01.3
/sys/kernel/iommu_groups/20/devices/0000:40:01.1
/sys/kernel/iommu_groups/10/devices/0000:00:08.1
/sys/kernel/iommu_groups/29/devices/0000:40:08.0
/sys/kernel/iommu_groups/0/devices/0000:00:01.0
/sys/kernel/iommu_groups/19/devices/0000:40:01.0
/sys/kernel/iommu_groups/9/devices/0000:00:08.0
/sys/kernel/iommu_groups/27/devices/0000:40:07.0

GPUs are 0a:00, 43:00, and 44:00, and each has it’s own group.

I do notice that many of my devices tend to be ‘behind’ my Host Controller:

# lspci
0c:00.2 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
0c:00.3 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller
   40:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex**
   40:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit**
   40:01.0** Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
   40:01.1** PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
   40:01.2** PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453
   40:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453

<snip>

   41:00.0** Non-Volatile memory controller: Device 1987:5007 (rev 01)
   42:00.0** Non-Volatile memory controller: Device 1987:5007 (rev 01)
   43:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)**
   43:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller (rev a1)**
   44:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)**
   44:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)**
   45:00.0** Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 145a
   45:00.2** Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor
...

I didn’t realize how broken this is compared to my 8 year old LGA1366 boards. I’m getting D3 like almost everyone else, and the accounts I’m finding where it work either require ESXi, or Vega cards, neither of which I’m willing to use for reasons of practicality.

The reddit thread where AMD was interacting with users has gone cold:

And what’s worse, the AMD rep disappeared with it:

https://www.reddit.com/user/AMD_Robert

I had AMD chips from the late 90s until the late 2000s. I had a fondness for them, and I switched to Intel reluctantly. Now I have the opposite sort of feeling. I’m not sure if I can convince my vendor to take this stuff back, but I guess I can liquidate it on eBay.

Bad news: I found UEFI 804 Beta, released only a few days ago, but it still contains microcode 0x8001129. Presumably this is part of AGESA 1.0.0.3.

In the thread another user mentioned that other boards are already running 1.0.0.4 in production. What concerns me about that is not only that our beta is behind other board’s production version, but also that I didn’t see any reports in recent days about IOMMU/Passthrough being fixed on boards that got the new microcode, like one can find with Ryzen.

If this is the case, then who’s to say it won’t be until 1.0.0.6 or .7 that this is addressed (if at all, no one guarantees us anything), and if current trends continue Asus boards could still be a point-release behind.

It’s unfortunate because on paper the platform would be great for this. I wonder if we aren’t seeing the result of AMD being a budget chip maker for the last decade plus. Intel likely has orders of magnitude more experience with this, and AMD hasn’t been much to speak of in the server space since Opteron, which predates virtualization instruction sets in processors. That a virtualization related bug with a significant effect on performance was just discovered after almost ten years seems to speak to this.

Anyway, I don’t wanna come off as a downer: it is what it is, money is for spending, and the journey is half the point. That said, after never being able to get multisaet working with X58 (single seat worked fine, but the platform was young and has issues with interrupts), I’m about ready for something that works out of the box now.

1 Like

Aye, it’s due to this level of frustration that I returned 1x 1950X back to Amazon (waiting on refund) and ordered a i9-7920X which is running Fedora 27/v4.13 at the moment. I got XenServer to boot (with @FurryJackman helping out!) but haven’t had time fully toying with that.

Will post my IOMMU findings from that build in the other thread asap…

Looks like a fix is finally at hand.

I already had an RMA# for mine, plus 20% restocking, and have ordered an Asus X299 MB (a used Rampage Extreme), which I’ll be sending back, it seems. Was just about to make an eBay listing.

1 Like

Nice, still got the second Zenith Extreme in the box - once the fix lands, I’ll order another 1950X.

I can confirm this gets me past the D3 error, but I’m still hanging at the UEFI splash. I left a comment in the reddit.

You mentioned (on reddit) having a GTX1070 on the host, how did you get the NVIDIA drivers to work with the 4.15-rc1 kernel?

Thanks!

This might be a bit outdated a thread to bring back up, But seeing as how this was a prominent thread that popped up when i googled this board and IOMMU configuration for an unRAID setup im working on. I thought i’d let any other passers by know the latest bios release for this motherboard (x399 Zenith Extreme) BIOS version.1402 has improved the IOMMU grouping significantly. A couple USB controllers have separate IOMMU groups now. Along with the built-in sound card and several other things now having their own IOMMU groups as well. Making it much easier for those of us experimenting with these Server+VM/s /w Native performance and functionality. And with the new threadrippers coming out. All of this is highly relevant again because this board is still one of the best and most appropriate boards on the market for the new chips.

2 Likes

Excellent. Thank you for sharing!