@SonWon The RX580 does not suffer from the Vega reset bug. It’s the card I use for passthrough and I have zero issues with it. Highly recommend.
Okay, so here I was trying stuff in this thread, not realizing my Q35 was still on 2.11. So none of the stuff I was doing editing the XML would have had the bleeding edge code in it, as I was running stable…
I did however figure out disabling ASPM helped if the guest was detecting 3.0 x16 when the host was actually 2.0 x16. That’s what we also need, a way to disable ASPM for all KVM VMs as a kernel argument, or per VM in libvirt/QEMU.
Which brand card do you have? Are you using Windows 7 or 10?
It shouldn’t matter which brand as it still an rx580 underneath. That said, its an ASUS DUAL-RX580-O8G with a Windows 10 VM.
I found three versions of the card. I think this one is likely what you have.
ASUS Dual Radeon RX 580 OC, DUAL-RX580-O8G, 8GB GDDR5, DVI, 2x HDMI, 2x DP (90YV0AQ1-M0NA00)
The only difference I could find between the versions was this code, 90YV0AQ1, last two characters change, QB, Q1, and Q2. The Q2 had a slightly slower OC. The QB was slightly less expensive. The Q1 had retailers that I deemed more trustworthy.
Thank you for the reply and I hope you have a great day!
I got my Vega 56 this weekend set up my q35 VM and everything and to be working well. Comparing superposition benchmarks to othervega users it looks like I’m getting bare metal performance. My itembare metal tests seemed to be skewed. I think my rx580 is being used for crossfire automatically. Because I was getting scores way higher than the leader board suggest I should.
I would like to know if there’s a better work around the reset bug than suspend to RAM. But if not that’s fine my system sleeps and wakes quickly.
Check this thread out. Although I heard it is broken for the newest Windows 10 updates, not confirmed.
I’m using ltsc so I might be able to do this. Thanks.
Can we please stop going off topic here, this thread is not about the AMD reset bug.
This is what AMD have said directly on the matter:
Most of the IPs on the GPU provide a light weight soft reset mechanism to reset that specific IP. Depending on the type of hang a soft reset may or may not be able to recover the IP. If it’s not, you have to do a full adapter reset. This resets the entire GPU. PSP and SMU do not support soft reset. They cannot be reset once stared without a full adapter reset. On older asics adapter reset was done by writing a special sequence to pci config space. Internally this reset was handled by the SMU. For vega10 and newer, full adapter reset is handled by the PSP (mode1 reset). Soft reset is not currently implemented for any of the IPs on vega10, but it works similarly to older IPs.
With that said, any further off topic posts to this thread will be deleted.
+1 on the patch set, built qemu-git 5:v3.1.0.rc2 with it and GPU-z previously reported PCIe x16 3.0 @ x8 3.0 (x8 is correct for my config with 2 GPU running x8 x8) but Nvidia control panel said it was still PCIe 1. Adding the options to config the link speed and width per patch 6/7 notes it now reports PCI Express x8 Gen3 in Nvidia control panel along with GPU-z. Will try to get some benchmarks in to compare, I saved my i440FX config so I can just quickly redefine with the old config and test.
took me a handful of tries to make this work in libvirt but i finally have it working on qemu 3.0 with the modified PA driver patch and a modified version of the patchwork PCIe link negotiation files.
I had to modify patch #6 just because line numbers are different between 3.0 and git master.
I haven’t done any benchmarks to validate any improvement.
here’s the XML I needed:
<qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.speed=8'/> <qemu:arg value='-global'/> <qemu:arg value='pcie-root-port.width=16'/>
How can I tell which pci root controller my GPU in connected to?
HWinfo 64 will allow you to see which pci root controller you are connected to or you can use device manager double click onto the device. The first thing you see is the location of the device.
since moving over to q35 and adding the patch I’ve got this additional device that doesn’t have drivers for it. https://media.discordapp.net/attachments/244187921228234762/517534667918934017/unknown.png?width=400&height=254
should I worry about this?
@urmamasllama, Silly questions but have you got the VirtIO drivers from:
I just extract that and then let Windows scan the directory for a driver update.
Other then that I do not know if anyone could help troubleshoot without posting your VM config (either qemu command line arguments or libvirt xml config file) unless someone just out of random experience knows what that is. I sadly do not.
yeah I already tried throwing virtio drivers at it, I don’t know how recent my driver package is though. may be a few months old. I’ll post my xml when I get home I guess.
Go to the “Details” tab and select “Hardware IDs”, here you will see a
DEV_XXXX, this are the PCI vendor and device IDs in hexadecimal.
Go to https://www.pcilookup.com/ and type in just the hex number, without the
DEV_ to lookup what the device is.
This method applies to ANY unknown PCI device in windows.
Isn’t the Serial port ?? Dont think you need it .
Hi, is there any chance these patches reach qemu 3.1 release or other systems involved ?
I’ve read a bunch of posts in this topic, indeed nvidia system information reports a single pciex lane on my GTX 1070 passed through. I noticed some freezing in Metro 2033, for instance, even though the game runs fine overall.
Testing a Samsung 970 Pro 1TB drive (also passed through) reveals that nvme is not affected as its top speeds are only possible with four full lanes available on 3rd gen pciex.
Anyway, huge thanks to gnif for his work on this issue.
I believe they are trying to get them queued up for 3.1, and will default to full speed in Qemu 4.0. These patches will only apply to platforms that actually have PCIe such as Q35, i440fx is out of the question.
This patch isn’t likely to help here, I would suspect a pinning issue or non local memory access before looking to a PCIe bandwidth issue.