Return to Level1Techs.com

Linux Host, Windows Guest, GPU passthrough reinitialization fix


#61

much appreciated.


#62

you can download bioses from techpowerup.

As long as you NEVER init the card with its on-board bios, you can do lots of experiments to find the one(s) that are less problematic by using external uefi file w/the card.

e.g. I think the sapphire cards have a fix?


#63

according to our survey data, they have the fewest cards with the bug, but there are bios revisions of the pulse that have it.

Our problem at the moment is that our staff only own WX cards, so testing others is problematic.

It’s something we’re working on.


#64

Hmm.

I have 2x reference vega 64s here.
1x Sapphire (purchased January 2018 - BECAUSE CRYPTO MADNESS - i got it for $100 above RRP. :smiley: )
1x XFX (purchased on release day for vega 64)

They shipped with different BIOS versions.

I flashed the XFX with the Sapphire Bios when i got the Sapphire (it was newer, figured i’d put the same BIOS on both of them).

I don’t have a setup at the moment to test with, but believe i ran into the reset bug last time i tried making it all work.

For what it’s worth, YMMV, etc.

Haven’t had a lot of time to do nerd stuff as of late unfortunately.

When i get time i should pull one of them out, replace with my old GTX760 and try get it working with 2 different cards rather than trying to start out the hard way with 2 identical AMD cards…


#65

I heard this was broken by the last Windows 10 update. Do you know if that is true?


#66

Couldn’t you get around this by flashing the bios?


#67

I didn’t read the whole thread, I’m sorry if the answer to my questions is already in here.

As far as I understood, the cards that have the reset bug can only be resetted with a full PCIe adapter reset. What exactly does happen at such a reset? Would it be sufficient to toggle the PERST# pin? If yes, one could build a simple adapter PCB to do this.

Cheers!


#68

No, the bus must be ready for it, everything stalls if you just assert the pin.


#70

I get the same issue with my PowerColor Vega 64 even after applying the scripts


#71

Which Vega 64 did you get to work? My powercolor red devil v64 doesn’t. Maybe AIB bios specific


#72

I hear kernel 4.20 is going to have a fix for this? At least that’s what i remember from the last time i went googling.


#73

No, there is no “fix” for this, it’s simply impossible at this point to reset the AMD SOC, even AMD admits that this functionallity is incomplete.


#74

I am hoping the next gen GPUs restore that functionality.


#75

The AMDGPU has the PSP_mode1 reset.
Would that be enough for passing the GPU?
I could try to port it to qemu as a quirk.


#76

@Pixo, no, I already implemented this as a quirk (https://gist.github.com/gnif/a4ac1d4fb6d7ba04347dcc91a579ee36) and worked with AMD to try to get it to function. The reset works, but the card can not be re-initialized as it ends up in a completely broken state.

If you check the amdgpu driver source you will see that the mode1 reset code is even commented as incomplete and you will also note the great lengths that they have gone to to “recover” a GPU after it has been reset, which doesn’t work.


#77

Yup, same here; desperately hope the Navi GPUs don’t have this issue especially if it is true they will support PCIe 4.0. My R9 390 that I upgraded from works perfectly fine.
I wonder if it is to do with HBM2 and the 2 additional parts of the card that show up when listing the pci devices (lspci -nnv). I read somewhere probably here at Level1 that the RX 5## cards and below aren’t affected, just Vega and Fury.
But could be wrong