I’ve been running into some odd issues with the amdgpu driver lately. Sometimes when using Proton with a Vulkan or DX12 game, or recording video with OBS, the game will turn into a rainbow colored puddle, like so:
Usually switching to a console, then restarting the display manager will get me back to the desktop. So far the only way I’ve found to consistently avoid this issue is 1) use DX11 (if available) or 2) not put the system under load and skip recording with OBS.
Does anyone know what might be causing this? It’s happened on two separate systems, one with an R9 Fury and the other with a Vega 56.
Interesting. Can you log your temps and your memory clock and voltages as well as your core clock. Lets see what might be going on. Ive had this happen on both AMD and NVidia when the thermal paste starts drying out. Also does setting a custom fan curve remove the issue?
Setting a fan curve didn’t seem to resolve the problem. I can probably work out a script to dump lmsensors output but I’m suspecting that temps aren’t the issue.
Idle temps on the Vega are 35C - 40C (as reported by lmsensors). That’s really not too bad. So far I haven’t been able to get mangohud working to monitor temps in-game.
Another data point: the graphics corruption in one game (Baldur’s Gate 3) is a known issue when using the Vulkan API, and DX11 is recommended instead (per protondb.com.) That was the Vega machine, and I’ll try a few more games and see what happens with it.
as i said before you may need hpet (high precision event timer) enabled in bios/uefi.
if its disabled then mangohud will throw errors.
when hpet is off and a game crashes it will do one of a couple of things.
freeze with a banned colour screen, and looping audio.
freeze with screen buffer overflow
your image where the wrong part of vram is being displayed to the screen, resulting garbage output.
this is often just a soft crash where you can close the game and restart the drivers.
and finally.
random bsods with apparently random causes, none of which appear connected, but are because of hpet use they just fail randomly.
while i am talking from windows experience, a lot of linux distros also uses hpet and for some reason it still ships in a default off state on some boards.
so worth checking at least