[project log / WIP] Epyc-Rome Gaming, Windows 10 KVM, RTX 3090 / PCIe pass-through, performance issues

Hi!

I’m finally able to build my fancy new big rig. I thought it’d be a great idea to have ONE machine that runs everything, virtualised. A mobo with a lot of PCIe sockets, gen4, with bifurcation, and stuff.

Current spec:

  1. ASRockRack ROMED8-2T mobo
  2. AMD Epyc 7402P, 24 core, 2.8GHz (3.35GHz max)
  3. plenty of octa-channel ECC-RAM, 3200
  4. Zotac Trinity RTX 3090
  5. ASRock RX 5500 XT Challenger ITX
  6. ASMedia ASM1142 USB 3.0 card, SilverStone ECU05
  7. NEC Renesas uPD720201 USB 3.0 card, SilverStone EC04-E
  8. ATEN US234-AT USB 3.0 sharing switch - 1 set of KB+Mouse, switching between 2 PCs
  9. PCIe bifurcation card, gen3 x16 -> x4x4x4x4, via 2 SlimSAS 8i SFF-8654 cables
  10. 1600 Watts no-worries kind of power supply, modular, plenty of options
  11. a bunch of extra goodies, a Dell PERC H310/LSI2008 8-port HBA, plenty of storage, fast SATA SSDs, spinning rust bulkers, NVMe R/W caches for the ZFS pools

Issues:

  1. I think my mobo’s UEFI (BIOS) setup does not do a good job regarding PCIe link spec - it lets me choose, but the options are [Auto, gen3, gen2, gen1]! One of the selling points were PCIe gen4 support. I’m not seeing any of it. I’m not quite yet disappointed, but 1 phone call away from invoking my fancy European customer protections. Comments? Which memos did i miss, because i’m not a big SI/System Integrator? (i’m just bumbling around in my home lab)

  2. Passing through the ASMedia USB card (spec 6.) (edit: to the Windows 10 Gaming VM) results in recurring HID drop-out - i guess the USB controller resets for a second, then continues working. (?) Really annoying when you’re trying to play a 1st person game. Update / solved: this issue couldn’t be reproduced with the NEC / Renesas USB card (spec 7.). I’ll try and use the ASMedia card when i set-up my Linux daily driver VM.

  3. I’ve struggled with the IPMI “KVM” (not the virtualisation KVM - the actual KbVidMouse remote control). There are a bunch of options in UEFI/BIOS setup; i think setting “Video OptionROM” to “Legacy only” fixed the undesired behavior of using any GPU, other than the built-in Aspeed BMC’s. (#TODO: boot into setup and edit this post with exact settings). However, i figured out how to ignore both GPUs (3090 & 5500) in POST & OS boot-up.

  4. My Windows VM feels kinda sluggish! I expected an upgrade from a Haswell i7-4790 to a fancy Zen2 Epyc to manifest in perceivable performance gain. I mean, the IPC gains allone should’ve made up for lack of core clocks! Which memo did i miss - is it my CPU/thread-pinning config? I’ve allocated the last 3 CCXs to that VM, 18 vCPUs, taking core-thread-siblings into account, 9 cores and their SMT siblings. (My 24 core Epyc 7402P has 3 cores per CCD, virsh capabilities told me who the siblings are.)

  5. Unigine Superposition benchmarked at 11005, which is at the bottom for 3090 results, even surpassed by 3080s - i’m running everything stock, though. That’s a sample size of 1, and i didn’t record much, didn’t tune much. No water cooling, yet.

I chose Epyc because that’s how i get A LOT of PCIe sockets, for all my VMs. Despite losing half of the RAM channels, i’d prefer a high clocking Threadripper, but there aren’t any mobos with enough PCIe sockets. It was 4 vs 7 - 4 degraded / some only x8 vs 7 x16 gen4 sockets. Bifurcation is a big deal as well. If you really need to bifurcate the frell out of all of your x16 sockets, no worries!

That’s it, for now. It’s a WIP/Work-In-Progress. I might update/edit/reply later. I’m not done yet - i’m just a bit tired.

Happy new beer! :beers:

1 Like

Are you sure that Auto is not the same thing as gen4? I have a worked on a motherboard that is PCIe gen 3, and the options for link speed are [Auto, gen2, gen1]. So for that board, auto = gen3, so for your board I expect Auto = gen4. Perhaps check in the OS what the link speed is for your 3090?

It is a STUPID naming convention, especially when they don’t explain any further in the manual.

What/Where does it feel sluggish? If it’s with general tasks/bootup speed/etc, have you checked your storage performance? Because disk images definitely have a performance hit compared to direct disk access.

The Windows desktop experience isn’t particularly snappy. Sometimes, window animations stutter - it looks like frame skipping/dropping - it seems to finish in time, but some smoothness gets lost along the way. My old Haswell handled this smoother - i’m running a heavy multi-treading optimized CPU now, which isn’t a particularly high clocker.

I’ll play with the KVM/qemu config, maybe relinquishing the vCPU core pinning helps - let the wisdom of much smarter developers handle scheduling and load distribution automatically. :sweat_smile:

May I ask what you will be populating the 4x4x4x4x with. Will it be storage, nics, usb host card, video capture card… etc?I’m curious to know if when you bifurcate a PCIe slot can you pass through those devices (on the same bifurcated slot) to different VMs. Really want to know more about bifurcation and your use case.

Yes, you may ask, that’s why i’m logging it here. :wink:

Currently, i have the 4x4 fully populated with

  1. Intel i350-T4 quad GbE NIC - for the router VM (will be PFsense or OPNsense)

  2. This fancy USB 3.0 card - i just integrated it today, not tested with actual VMs yet, looks promising though. It’s an ASMedia ASM1806 Gen2 x2 to 4 Gen2 x1 PCIe switch (PCI bridge) and 4 downstream NEC / Renesas µPD720202 USB 3.0 controllers - each in their own IOMMU group.

IOMMU Group 18 c4:00.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU Group 19 c5:00.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU Group 20 c5:02.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU Group 21 c5:06.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU Group 22 c5:0e.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1806] (rev 01)
IOMMU Group 23 c6:00.0 USB controller [0c03]: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller [1912:0015] (rev 02)
IOMMU Group 24 c7:00.0 USB controller [0c03]: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller [1912:0015] (rev 02)
IOMMU Group 25 c8:00.0 USB controller [0c03]: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller [1912:0015] (rev 02)
IOMMU Group 26 c9:00.0 USB controller [0c03]: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller [1912:0015] (rev 02)
  1. the ASMedia (Silverstone ECU05) USB 3.0 card - gave me trouble in Windows before, will test it again in the LinuxDailyDriver VM

  2. the other Renesas (SilverStone EC04-E) USB 3.0 card - currently running in the Windows 10 VM, seems fine, no issues

The whole x16 socket is manually downgraded to PCIe Gen2, because everything downstream is Gen2 anyways. All the other ports/sockets are on Auto/Gen4. Everything downstream of this bifurcated port will be passed through to a different VM. I haven’t actually tested it, but i don’t expect complications. They’re all in their own IOMMU group, afaik, that satisfies the pass-through requirements.

2 Likes

Awesome. Thanks for the info. And yes, Please update us on your success with pass through and the performance issues you’re working out, oh and I’d love to see what kinda of case you’re using to put all of this hardware into.

Looking forward to seeing more from this project.

Cheers!

1 Like

So looks like a few people are having performance issues in Win 10 under virtualization?

It’s a Thermaltake Core W200

1 Like

I’m honestly not quite sure what to expect. Today, while tending to the HID drop-out issue with the ASMedia USB, i ran Metro Exodus for a few minutes, max settings DX12, and 3D performance was just fine. I can’t give you FPS figures because none of my tools seem to cope with DX12. I hope MSI Afterburner works.

Well there have been a few threads over the past few weeks with different gen hardware with similar issues. I wonder if we are all missing some key component or setting?

Maybe an @wendell solve it all

I have an Epyc 7302 and a Tyan S8030 motherboard. I haven’t really deep dived into performance, but I have a “it works” proxmox host with a Fedora VM as my workstation now.

Make sure to browse through these: https://developer.amd.com/resources/epyc-resources/epyc-tuning-guides/

One thing that stood out in High Performance Computing: Tuning Guide for AMD EPYC™ 7002 Series Processors

4.2.6 Memory Frequency, Infinity Fabric Frequency, and coupled vs uncoupled mode

The Memory clock and the Infinity Fabric clock can either run at synchronous frequencies, called coupled mode, or at asynchronous frequencies, called uncoupled mode.

AMD EPYC supports DDR4 frequencies up to 3200 MT/s, however the fabric clock can be synchronous to a maximum speed of 2933 MT/s (or 2667 MT/s, for lower-power Group B infrastructure parts).

If your memory is clocked at or lower than 2933 MT/s, the memory and fabric will always run in coupled mode which will provide the lowest memory latency.

If you are running DDR4 memory at 3200 MT/s, the memory and fabric clocks will run in uncoupled mode. This provides slightly higher bandwidth at the cost of increased memory latency.

If your system supports 3200 MT/s memory, you can experiment with coupled mode at 2933 MT/s and uncoupled mode at 3200 MT/s to determine which is best for your workload.

In the BIOS, set your memory frequency to the desired speed and make sure APBDIS is set to 1 and fixed SOC Pstate is set to P0.

Games tend to be more latency sensitive, so give that a try.

There’s also this post, which details how isolcpus is often ignored.

I fought with DPC latency issues off and on for years, and always apparently in the same places as you: nvidia and network drivers. I was similarly confused when pinning the VM to isolated threads didn’t make a difference. Then I discovered that systemd ignores the isolcpus kernel option and binds processes to all processors by default. Since it’s PID 1, basically everything was allowed to use my supposedly isolated CPUs. Setting CPUAffinity to the inverse of isolcpus in /etc/systemd/system.conf took care of most of that.

Still some processes remained on my isolated CPUs. Turns out Docker, too, ignores isolcpus. You need to specify the --cpuset-cpus docker argument as in systemd.

And even still, there was one remaining process on my isolated CPUs: the qemu emulator itself. Setting the libvirt emulatorpin option to the non-isolated CPUs finally left my isolated CPUs completely idle.

Now with my VM truly not having to compete for CPU time, the DPC latency issues disappeared. They didn’t just get better–they’re gone. No more audio clicks and pops.

3 Likes

Isol CPUs and maybe the numa options. You can achieve near bare metal perf with tuning. So install windows bare metal and see what that is

Threadripper pro was built for you though

2 Likes

AFAIK, with Windows 10, the desktop is at least partially GPU accelerated.

How are you accessing the VM? Monitor directly connected to the passed-through GPU? Looking Glass? Virt-Manager/Virt-Viewer?

If you are accessing the VM through something other than direct monitor connected to the GPU or Looking Glass, my first culprit would be that the emulated GPU might be the cause.

I’m passing through a 3090 and i’ve removed all the virtio/SPICE VGA things from the VM. That’s why i mentioned it - i woudn’t expect great performance from software rendering.

There just hasn’t been a Threadripper board with a sufficient amount of PCIe sockets. This changes soon though - the announced Gigabyte WRX80-SU8-IPMI tickles all my fancies :nerd_face: :star_struck: :smiling_face_with_three_hearts: :heart_eyes: edit: SOO much storage connectivity - 3 SlimSAS ports, each capable of SATA3 x4 (or PCIe g4 x4) - i woudn’t even need the Dell PERC H310 anymore.

Can’t wait to see TR Pro 5000 cpus with WRX80, they’ll be such monsters.

1 Like