I can safely say yes it is still a problem, but the impression I have gotten from all the posts I have read is it seems to be hit and miss and related to the card manufacturer and maybe card BIOS
I have a PowerColor RedDevil Vega 64 and it suffers from the bug
I did find something with a Vega 64 (ASUS Strix) on redit where someone was able to reset their vega by removing it, then re-scanning, kind of like removing a device in device manager in windows then re-scanning for hardware changes to reinstall the device.
These are the steps the person took
Power off the Vega GPU
echo "1" | sudo tee -a /sys/bus/pci/devices/0000\:0a\:00.0/remove
# <-GPU
echo "1" | sudo tee -a /sys/bus/pci/devices/0000\:0a\:00.1/remove
# <-HDMI/DP audio device
where “a” is the device address/ID
e.g
echo "1" | sudo tee -a /sys/bus/pci/devices/0000\:04\:00.0/remove
echo "1" | sudo tee -a /sys/bus/pci/devices/0000\:04\:00.1/remove
Suspend to RAM
sudo systemctl suspend
other systemctl commands may work but haven’t tried yet
log back in to your desktop environment and rescan PCIe devices by entering the following
echo “1” | sudo tee -a /sys/bus/pci/rescan
or the following 2 commands
sudo chmod 777 /sys/bus/pci/rescan
sudo echo 1 > /sys/bus/pci/rescan
Check it has reset
lspci -vv | grep vfio -B 12
Restart Lib-virt so the virt manager can see the GPU again
sudo systemctl stop libvirt-bin
sudo systemctl stop libvirt-bin.socket
sudo systemctl start libvirt-bin
The person said this has to done as soon as the VM is powered off otherwise it will not reset.
EDIT: I have checked again on my V64 and this process works. You shouldn’t have to restart libvirt. I had to the first time because I deleted the V64 from my VM when it didn’t output video.
You have to run the process as soon as you shut down the VM.
I would still suggest not getting the PowerColor Red Devil it has a tendency to shut itself off. I may sell my card and get a different brand.
Here is a script I have just put together so need to test to make sure I can just run everything from commandline in one step
#!/bin/bash
# copy this file to /usr/bin/reset_vega.sh
# This script must be run immediately after you shut down the VM. It doesn't work if the GPU has been left for too long
# following a shut down. It doesn't work if the VM is rebooted.
# to run simply open terminal and run: cat cat /usr/bin/reset_vega.sh
# Remove/Power off the Vega GPU like uninstalling devices in Windows device manager
echo “1” | sudo tee -a /sys/bus/pci/devices/0000:0d:00.0/remove
echo “1” | sudo tee -a /sys/bus/pci/devices/0000:0d:00.1/remove
# Suspend to RAM
systemctl suspend
# Read any user input
read input
# Change permisions of the PCI rescan command
sudo chmod 777 /sys/bus/pci/rescan
# Rescan PCI devices to reinitialise the vega GPU
sudo echo 1 > /sys/bus/pci/rescan
# This line is replaced by the last 2 because it throws invalid argument errors
# echo “1” | sudo tee -a /sys/bus/pci/rescan