Linux Host, Windows Guest, GPU passthrough reinitialization fix

I completely forgot how i got the vega to behave until i had to reinstall windows.
Here is the link with information that i followed:
[Linux Host, Windows Guest, GPU passthrough reinitialization fix]

MB: ASROCK Fatal1ty X399 Professional Gaming
CPU: TR 1920X
RAM: 2x8GB + 2x16GB G.Skill 3200 C14 Trident Z , memory interleaving set to channel to expose NUMA
GPU: R9 290 (primary), RX 550, Vega 64
kernel: 4.15.6
config: (https://forumwin.txt (6.6 KB)
.level1techs.com/t/linux-host-windows-guest-gpu-passthrough-reinitialization-fix/121097)

What if I told you that is the exact same thread you were posting this in

1 Like

^That, right there, is the best post on this thread! hahaha! And I read this thread every time I get a notification it’s been updated! Priceless!! Hahaha!

Ups… got lost somewhere, or probably merged the 2 threads in my head.
Have to find the correct thread now, something about Vega on Threadripper.

thank you very much for the HW specs and config file.

Sorry to necro (ish), but this seems to be the best place to ask rather than starting a new thread, as everyone here has AMD hardware with the reset bug.

Can anyone confirm that the 4.16 branch with updated qemu ACUTALLY fixes the reinit issues you were having on vega, or any other series?

in general, yes, but I’ve not seen any that replicate this same functionality.

If you know of any that do, could you point me to them?

There is no fix for the Vega FLR bug… AMD’s engineers have confirmed that they know of this bug, and it is a firmware and/or hardware bug.

2 Likes

My suspicion is that it’s firmware, as we’ve seen bios revisions shipped with non-reference cards where the bug is mitigated or not present.

The reason I ask is because I’ve been seeing anecdotal reports of it being fixed in recent kernels. I doubted it was true, but I needed to independently verify that to some extent.

1 Like

fwiw, my Vega 64 Liquid Cooled model didn’t have any reset issues on kernel 4.15. The LC model I think has a different BIOS than the other reference cards, however. This was on a Threadripper 1950X system with Zenith Extreme.

I ended up moving over to a 1080 Ti though.

that’s what I’ve seen, and I can’t find good info on the reference bios differences other than power limit and mem straps

I will follow up with my contact at AMD and see if I can get an answer on this.

2 Likes

much appreciated.

you can download bioses from techpowerup.

As long as you NEVER init the card with its on-board bios, you can do lots of experiments to find the one(s) that are less problematic by using external uefi file w/the card.

e.g. I think the sapphire cards have a fix?

according to our survey data, they have the fewest cards with the bug, but there are bios revisions of the pulse that have it.

Our problem at the moment is that our staff only own WX cards, so testing others is problematic.

It’s something we’re working on.

Hmm.

I have 2x reference vega 64s here.
1x Sapphire (purchased January 2018 - BECAUSE CRYPTO MADNESS - i got it for $100 above RRP. :smiley: )
1x XFX (purchased on release day for vega 64)

They shipped with different BIOS versions.

I flashed the XFX with the Sapphire Bios when i got the Sapphire (it was newer, figured i’d put the same BIOS on both of them).

I don’t have a setup at the moment to test with, but believe i ran into the reset bug last time i tried making it all work.

For what it’s worth, YMMV, etc.

Haven’t had a lot of time to do nerd stuff as of late unfortunately.

When i get time i should pull one of them out, replace with my old GTX760 and try get it working with 2 different cards rather than trying to start out the hard way with 2 identical AMD cards…

I heard this was broken by the last Windows 10 update. Do you know if that is true?

Couldn’t you get around this by flashing the bios?

I didn’t read the whole thread, I’m sorry if the answer to my questions is already in here.

As far as I understood, the cards that have the reset bug can only be resetted with a full PCIe adapter reset. What exactly does happen at such a reset? Would it be sufficient to toggle the PERST# pin? If yes, one could build a simple adapter PCB to do this.

Cheers!

No, the bus must be ready for it, everything stalls if you just assert the pin.

I get the same issue with my PowerColor Vega 64 even after applying the scripts