Still having reset issues after upgrade to 6800XT when gpu hangs

Hello all, i finally managed to get my hands on a 6800XT! and i was stoked since the reset bug was supposed to be fixed. and while it indeed does reset properly for like switching to a different os vm or just rebooting it. it would seem that it still cant reset after a crash of the GPU which means i have to reboot the entire host so i may as well still be on 5700XT. sometimes when playing games like dyson sphere program i will just get a random GPU crash at which point the host DMESG gives me the following

[74544.471599] vfio-pci 0000:23:00.1: can't change power state from D3hot to D0 (config space inaccessible)
[74544.471900] vfio-pci 0000:23:00.1: vfio_bar_restore: reset recovery - restoring BARs
[74544.552306] vfio-pci 0000:23:00.0: vfio_bar_restore: reset recovery - restoring BARs
[74544.556718] vfio-pci 0000:23:00.1: can't change power state from D3hot to D0 (config space inaccessible)
[74546.821513] pcieport 0000:22:00.0: not ready 1023ms after bus reset; waiting
[74547.941476] pcieport 0000:22:00.0: not ready 2047ms after bus reset; waiting
[74550.031439] pcieport 0000:22:00.0: not ready 4095ms after bus reset; waiting
[74554.181357] pcieport 0000:22:00.0: not ready 8191ms after bus reset; waiting
[74562.501197] pcieport 0000:22:00.0: not ready 16383ms after bus reset; waiting
[74579.140870] pcieport 0000:22:00.0: not ready 32767ms after bus reset; waiting
[74612.430202] pcieport 0000:22:00.0: not ready 65535ms after bus reset; giving up
[74612.563987] vfio-pci 0000:23:00.1: can't change power state from D3hot to D0 (config space inaccessible)
[74614.810155] pcieport 0000:22:00.0: not ready 1023ms after bus reset; waiting
[74615.870132] pcieport 0000:22:00.0: not ready 2047ms after bus reset; waiting
[74617.950095] pcieport 0000:22:00.0: not ready 4095ms after bus reset; waiting
[74622.110018] pcieport 0000:22:00.0: not ready 8191ms after bus reset; waiting
[74630.339855] pcieport 0000:22:00.0: not ready 16383ms after bus reset; waiting
[74646.979526] pcieport 0000:22:00.0: not ready 32767ms after bus reset; waiting
[74680.258872] pcieport 0000:22:00.0: not ready 65535ms after bus reset; giving up
[74680.261208] vfio-pci 0000:23:00.0: can't change power state from D0 to D3hot (config space inaccessible)
[74681.445191] vfio-pci 0000:23:00.1: can't change power state from D3hot to D0 (config space inaccessible)

it seems the card just drops off the bus entirely and it cant be reset. is there anything i can do to attempt to recover or am i just doomed to restarting my server whenever the driver wants to crash? i wish GPU just had a physical reset on them lol PCIE supports hot swap (well if anyone cared to actually implement it) so a hard reset would be fine with VFIO

I’m seeing the same issue, albeit with a Linux (Ubuntu 20.10 + Xanmod 5.10.15) guest, though it’s only happened when attempting to restart/shutdown the guest (and not all the time).

I haven’t had a running crash yet and unfortunately I’ve found no way to reset the GPU from the host - so when it happens it’s Game Over :slightly_frowning_face:

Host: Arch 5.10.15-zen

I was seeing something similar but after swapping power supplies it has been fine. For preventing the crash in the first place I mean.

Another user reported upping vcc io helped them.

@wendell thanks for the input :slightly_smiling_face:

Looks like the issue my end has been resolved by removing the Plymouth splash (I’ve also moved to the non-rt Xanmod kernel)!

1 Like

Sorry for the late reply here but life has been crazy lately. My issue seems to have been with Dyson sphere in general. After working on the mesa/radv irc we figured out the game was doing something really stupid that was causing the card to really hard crash. The game patched it while we were debugging and it’s been flawless since! Thanks for the replies

1 Like