Patch NPT on Ryzen for Better Performance | Level One Techs

Unless I’m missing something, the build-log doesn’t seem to have any useful information. Here it is. It does, however, point to this code. The warning about unexpanded macros seems to be harmless.
AND just in case, my current ryzen.patch file. If anyone wants any more information, I would be happy to provide it.

Alright here are some quick and dirty benchmarks, the difference that this patch made was night and day.

In total war warhammer II’s campaign benchmark at 1080p
Before: Min 10, Max 25, Average 16.9 FPS
After: Min 43, Max 86, Average 61.6 FPS

In total war warhammer II’s battle benchmark at 1080p
Before: Min 6, Max 17, Average 10.8 FPS
After: Min 43, Max 61, Average 54.3 FPS

In Civ 6’s graphics benchmark at 34440x1440
Before: 99th 65.95, Average 47.66 frame times in MS
After: 99th 24.16, Average 18.68 frame times in MS

Civ 6’s AI benchmark at 3440x1440
Before: Average 33.2 second turns
After: Average 26.5 second turns

Skyrim (just eyeballing it) outside
Before: Min 1, Max 35, Average 20 FPS
After: Min 30, Max 50, Average 40 FPS

Skyrim (just eyeballing it) inside
Before: Min 30, Max 55, Average 40 FPS
After: Min 60, Max 60, Average 60 FPS

I am using my old card GTS 450 for passing through to my Ubuntu VM, but this card suffers from a reset bug. I have tried detaching the card via command line and then shutdown VM, tried rom dump and feed it in the config - nothing helps. Every time I shut down the VM or reboot, I get black screen and I need to reboot host PC.
Is there any way to power cycle PCIe from host command line in order to initialize it more than once?
BTW, I am on Ryzen 1600 and Asus X370-pro with the latest BIOS (1001).

This is a hardware bug that can’t be fixed. On some GPU that have dual-bios (my Fury Nitro for example) can be reset by switching to the other BIOS.

OK and what about that patch that Wendell is talking about in the stream starting @11:35? Kernel 4.15 patch with power cycle to the PCIe?

1 Like

Basically this (as I understand at least):

1 Like

I’m not super up to date on it, because I’m not using Vega or any other GPUs that suffer from this bug (at least, not for passthrough)

I’m going to defer to @mihawk90 and Wendell since they’re clearly following it more closely.

actually not really, just reading a bit :stuck_out_tongue:

Still following it more closely than I am. I just stay away from GPU with the issue.

So, for my issue it would be either to somehow reboot or shutdown Ubuntu VM with a some kind of script rather than sudo shutdown or to power cycle on the host system.
I have tried to run this from host OS:

virsh detach-device Ubuntu /mnt/user/system/gpudev.xml
virsh detach-device Ubuntu /mnt/user/system/audiodev.xml
virsh destroy Ubuntu

where xmls are:

GPU
gpudev.xml

  <hostdev mode='subsystem' type='pci' managed='yes'>
    <source>
      <address domain='0x000' bus='0x29' slot='0x00' function='0x0'/>
    </source>
  </hostdev>

HDMI audio
audiodev.xml

  <hostdev mode='subsystem' type='pci' managed='yes'>
    <source>
      <address domain='0x000' bus='0x15' slot='0x00' function='0x1'/>
    </source>
  </hostdev>

After that it says that device was successfully detached and vm was destroyed, but still I cannot use GPU for the second time without host OS reboot.
Any ideas?

I have been trying to get some support from AMD on this and so far nothing, not a peep, which is a shame because it will only benefit them. If people are willing to donate the required hardware to reproduce this problem I would be willing to spend some time on it and try to resolve it. To be honest I do not agree that it is a hardware bug, there is much we can do to poke at PCI devices on the software side of things that may yield a fix to this problem.

1 Like

Also , in my case, if I hibernate host PC and bring it back, I can use GTS 450 for VM again. Is there a way to power cycle PCIe from command line or do you think it is also PC power supply related (i.e. power supply power cycles voltages when hybernated or rebooted)?

That is honestly sad to hear :disappointed:

If you have a funding campaign opened, I would be willing to donate some of my money toward the cause.

I think I remember hearing somewhere that ESXi does not suffer from this issue, but I am not sure that is true.

How have you been trying to contact them? Maybe we should raise hell on twitter in your name.

We’ve been finding that out. Lots of people are ejecting the GPU, killing the power to the PCIe slot, flipping the BIOS selector switch on the physical card and it’s solving the problem. I think this can definitely be solved in the VFIO driver.

1 Like

Directly via their support system, Reddit and Wendell I believe also has tried to get them to come to the table.

Exactly, which is why I am fairly confident that with a bit of time and some hardware this could be resolved.

2 Likes

I’ll try to contact them through my business partnership. We’re running Intel products in our datacenters. I want to use EPYC and TR, but only if these problems on the kernel are fixed, so money may motivate them.

2 Likes

Thanks mate, that would be awesome!

The NPT is resolved and there are patches for qemu and kvm to pass SMT on Ryzen.

Now if only the pcie !!! Unknown header type 7f was resolved.
When passing cpu integrated audio I need to use ACS patch to separate the unused sata and have it bound to ahci driver.
This will prevent qemu to reset the device (maybe whole bus). If I dont do this the devices on that bus will all end up with malformed pci header.
Same for Vega. Only difference is that Vega end up like that after host reboots or shutdowns.
Funny thing is that if i dont pass the gpu audio i can do one reboot/powercycle the VM before Vega does not boot.
And from what i read the reset problem seems more like sloppy rom than hw problem. At least one person reported he could reboot the VM with Saphire Vega56.
Same problems with Polaris. Cards from some vendors can be reset and from other cant.

In the video, Wendel talks about a patch that power cycles the PCI device for Linux 4.15. Does anyone have the link to the patch/discussion thread (or mailing list)?

There is some work on this entering the kernel, specifically for the Vega, but AFAIK it doesn’t fix the problem still. I have been informed that the new AGESA may have fixed this problem.

1 Like