Patch NPT on Ryzen for Better Performance | Level One Techs

The golden age of Ryzen for gaming virtualization is upon us. If the reader is not familiar with this technology, it allows one to run a virtual machine with direct access to a secondary graphics card. This allows full GPU acceleration of workloads running inside a virtual machine. The most common use cases for non-server non-enterprise uses are for running legacy applications and games that require direct access to a GPU. If everything is working correctly, there is a negligible performance hit. Running a gaming VM vs ā€œbare metal.ā€


This is a companion discussion topic for the original entry at https://level1techs.com/article/patch-npt-ryzen-better-performance
4 Likes

The patch worked out for me :slight_smile:

Configuration

Hardware

Asus Strix B350,
16GB trident Z 3200mhz,
R7 1700,
GTX 1070 (GPU passthrough),
Radeon 7770 (Host GPU),

Software

Iā€™m on Ubuntu 17.04 with kernel 4.13.7+ (with the npt patch linked in the forum post)

With npt=0 before, I could run games like overwatch and l4d2 at 150fps+, however PubG or any game with more cpu intensive workloads would stutter hard.

PubG Framerates

Before Patch:

npt=1, ~20fps
npt=0, ~60fps (But very hard stuttering, felt like ~30 honestly)

After Patch:

np1=1, ~70

As far as I can tell, this is pretty close to my baremetal stats. It just started working though so I havenā€™t given it much time to test, but I am very happy about this :slight_smile:

5 Likes

Damn, I wish Raven Ridge came out by now, even so wonā€™t I need an additional dGPU?

You always need 2 GPUs, regardless whether they are both dedicated or one is iGPU and one is dGPU. Reason is that currently comsumer GPUs cannot be bound to 2 drivers at the same time.

17 Likes

That look is like ā€œmom look what the cat didā€

4 Likes

for those on arch, if you want to use it in a pkgbuild download the original patch from the url here:
https://patchwork.kernel.org/patch/10027525/raw/
and save it as something like npt.patch

then download a snapshot of the kernel package you want to use in the aur, drop the patch in the untarā€™d snapshotā€™s folder, then edit the pkgbuild.

it was easier for me to apply it this way. Iā€™m testing it with amd-staging-drm-next-git with vega. aur package here:
https://aur.archlinux.org/packages/linux-amd-staging-drm-next-git/

edit the pkgbuild by adding the patch to the source list, and a ā€˜SKIPā€™ line to the sha256sums (or an actual sha256 if you really want to.), then add or edit the cd src-name-here inside prepare() like this:

cd ā€œ${_srcname}ā€ && patch -p1 -i ā€¦/npt.patch || exit

in our case it was on line 41 of the pkgbuild.

then simply makepkg -i or makepkg and sudo pacman -U the-new-package-name

easypeasy. thanks wendell!

2 Likes

Does you kernel use the Archs ACS override patch?
Cause the AUR linux-vfio has a typo in its patch.

Next thinks that needs fixing:

  • AVIC
  • Qemu does not set topology for AMD CPUS

1:16:15: Have you seen the qemu patch that fixed the latency issues with pluseaudio Wendel?

2 Likes

Iā€™m not the maintainer of the package I linked, only posting it for example. If linux-vfio has a typo it should be simple enough to download that snapshot and correct the typo in the patch then compile it or even add it to a different kernel aur package pkgbuild

what happens to be the typo in the patch?

There is ; that sohould be :

   p += strcspn(p, ":");
    -if (p != ';') {          
    +if (p != ':') {          
        pr_warn("PCIe ACS invalid ID\n");

thanks! works. last chunk wasnā€™t at the proper lines for amd-staging-drm-next-git so i edited that in the patch also.

 /*
  * Following are device-specific reset methods which can be used to
  * reset a single function if other methods (e.g. FLR, PM D0->D3) are
@@ -4487,6 +4587,7 @@ static const struct pci_dev_acs_enabled {
 	{ 0x10df, 0x720, pci_quirk_mf_endpoint_acs }, /* Emulex Skyhawk-R */
 	/* Cavium ThunderX */
 	{ PCI_VENDOR_ID_CAVIUM, PCI_ANY_ID, pci_quirk_cavium_acs },
+	{ PCI_ANY_ID, PCI_ANY_ID, pcie_acs_overrides },
 	/* APM X-Gene */
 	{ PCI_VENDOR_ID_AMCC, 0xE004, pci_quirk_xgene_acs },
 	{ 0 }

@wendell Did you have to deal with configuring numa on threadripper, or even ryzen?

What does NPT stand for?

Nested Page Tables.

Normally, the OS swaps page tables when it ā€œcontext switchesā€ between processes as it multitasks, sometimes it needs to do this just to answer a syscall or use a driver to talk to hardware.
The tables map virtual memory addresses grouped into pages of memory as seen by a process, to physical memory addresses.

Without NPT, When running a virtualized OS, virtualized OS canā€™t be allowed to do that, because it canā€™t be allowed access to all of memory, so in order to keep things working safely, when guest OS tries to do it, that generates an exception / protection fault / ā€¦ trying to do one of these things, host OS needs to catch that and do it on behalf of the virtualized OS, in a safe manner, pretending like that operation succeeded.

NPT allows the host OS to assign a set of pages for the guest OS in advance as pages that are owned by this guest, and allow the guest OS to work within this set of pages, without having to go through the fault / interrupt / exception/ ā€¦ mechanism every time the guest needs to context switch from one process to another.

Itā€™s similar to how a process on a system normally doesnā€™t have direct access to all of memory directly, but ends up running in some virtual address space that maps to physical memory via page tables. Itā€™s ā€œnestedā€ because itā€™s just one more level of that.

This sucks more for ā€œcompilingā€ in the guest because thereā€™s lots of processes created all the time which creates memory spaces which is targeting the expensive path of the guest/host interaction.
It sucks, but sucks a bit less for gaming, because usually youā€™d run your game and only a couple of idle other things in the guest os.

But, you may notice it more during gaming cause itā€™s interactive, whereas usually when compiling, you press enter, you take a sip of coffee, or take a walk around the apartment or an office, and it may or may not be done by the time youā€™re back.

edited: cause I suck at explaining apparently and tend to presume lot of prior knowledge.

5 Likes

Thanks for this write-up Wendell, itā€™s been super useful. Applying the patch seems to have improved performance for me, but itā€™s still behaving quite weirdly. CrystalDiskMark is reporting something like 3.5GB/s write speeds, and crashing before the tests finish, for example. 3DMark Timespy seems to crash on opening/lock the VM more often than not now too. I played a few minutes of GTAV earlier with frame rates between 45 and 90, with occasional drops to 10 and with persistent micro-stuttering., This is at 1080p with a GTX1080 passed through, so Iā€™d still expect a little better really. Has anyone else noticed increased guest crashiness since applying this patch?

Fedora, kernel 4.13.9-200. R9 380 host, Strix 1080 guest, Gigabyte AX370 Gaming K5. Host on NVMe, guest on SATA SSD.

try pinning your qemu processes to particular cores either manually or with the kvm configuration and report back? that should help with the microstuttering. Keep an eye on htop and see if you see loads shifting between cores when there is a microstutter as there is about 1 sec of lag in htop

As per your last guide, I have the virtual cores pinned to 0 through 7 of the physical ones in the VMā€™s XML. The stutters are small and frequent - might it be that Iā€™m using emulated SATA for the storage device rather than VirtIO? Thatā€™s the next thing Iā€™ll try. Iā€™m still getting crashes, mostly at the time of loading 3Dmark and other large applications. Perhaps the virtualised storage interface is the issue there too? Thanks again for all your assistance my man, Iā€™ve learned a lot doing this. Compiling my first kernel today felt like a rite of passage!

So far, 3Dmark Timespy scores around 6200 in the VM and 7200 on baremetal, with like 95% of the score difference being due to the fewer CPU cores. GPU performance is damn near identical. Iā€™ll do a bunch more comparison benchmarks when Iā€™m done fiddling with it.

1 Like

You have the right process. Vfio drivers are a great idea. Maybe also enabling hugepages if not already.

Thanks again for your wisdom. Iā€™ll get some sleep and return with some interesting comparison numbers soon