Vega 56 Sporadic Crashes (Manjaro/Wine/Proton Gaming)

First impression of Wayland is: works like a charm. It even seems frame pacing in games is a little more consistent than it was under x11. But maybe that is just bias, because I am paying too much attention to little details now.

There are a few quirks with Wayland. Only one is annoying, though: every now and then (in specific situations) the mouse cursor jumps/resets its position. I can get used to that, but itā€™s awkward.

Have you had any more crashes since switching the GPU bios? Watching this thread with interest as Iā€™ve had similar issues with my vega56 and 1920x. Iā€™ve always assumed it was amdgpu not playing nicely with something in proton but if it is hardware / firmware Iā€™ll start testing there.

Switching the GPU BIOS did NOT help. There was one crash after the switching. Since then, I moved from X11 to Wayland as a GUI infrastructure. No crash yet, but not enough time has passed to announce anything of statistical significance.

By the way, I am running RADV (the independent open source driver, with some official help from AMD), not AMDGPU (AMDā€™s own code base, open sourced later).

Okay, time to try and jinx it again.

With wayland, there have not been any more graphics related crashes. Maybe it is really X11 at fault, or the interaction of the rest of the driver stack with X11?

An update for the sake completeness; I donā€™t think this is really related to the original crashes.

One game froze on me recently. But it was a much less disruptive error. I could still alt tab to a terminal window, kill the hanging process, and move on. Not anywhere near the unrecoverable failure described earlier, which would require a hard reset. (And a subsequent restart from checkpoint of whatever long running job the machine had been doing in the background ā€¦)

Wayland, after two weeks of use, feels like itā€™s almost ready for prime time. Iā€™d prefer its quirks over the hard crashes in any case.

So hereā€™s hoping that switching from X11 to Wayland will indeed fix this in the long term.

So. I finally had accumulated enough optimistic hope that the computer decided to bring me back down to reality with a nice catastrophic crash. So wayland wasnā€™t it either.

But I will stay on wayland to get a better estimate on the long term frequency of crashes.

(Edit: I have a suspicion that psensor running in the background might increase the risk of these crashes.)

1 Like

Change of plans (or desperation move). I installed Ubuntu 19.10 on another disk and will be using that for a while. All the drivers and the kernel are a few versions older here, but maybe they play nicer together in this mainstream Linux distribution.

Nope. Ubuntu isnā€™t immune, either. Hmm.

You definitely want to use separate power rails if you PSU has multiple rails. Your card will pull two cards worth of power easily. If your PSU has multiple rails cut the max PCIe power rating in half and thatā€™s what you get from each rail if you have two rails. If you have more rails divide by number of rails. You can log your power output under load I have gotten mine above 400 watts it generally sat at 390 watts under max load. It is a very power hungry card. Also turn on a custom fan curve the card is way over powered and way under cooled even at idle. Get that fan up to 40-60% minimum and remember to reset the curve on reboot or set it to boot with the system. You should also replace the thermal paste with IC Diamond or Arctic Silver thermal paste or better. Your max temps wont really be affected (maybe with some cryonought and a RAIJINTEK cooler or something) but the temps will drop way faster when load drops like almost immediately. Stock thermal paste can take minutes to drop the temps even at max fan. I hope that helps.

The symptoms are not those typical for overloaded PSU. No sudden shutdown due to overcurrent protection. No particular heat or noises from the PSU; no funny smell from the PSU either.

The crashes donā€™t seem to be related to GPU temperature; they donā€™t correlate with GPU fan speed (the reference blower fan is audible enough that I would notice).

You might be able to look at some logs and see what causes the crash/freeze. I have heard NAVI gpus had issues like this early on but if you run bleeding edge Mesa (is it 13?) then it shouldnā€™t be occurring.

I assume your GPU is rock solid under windows?

You could always use the internet to search for similar issues and try what was suggested, might strike it lucky. (again assuming your GPU is rock solid under windows!!!)

Hereā€™s a conversion log about someone having some issues with amdgpu (not navi)
https://bugs.freedesktop.org/show_bug.cgi?id=105733

One with Navi: https://bugs.freedesktop.org/show_bug.cgi?id=111481

There is no windows on this box, unfortunately. I did check various logs, but the crash seems to happen before any messages can be recorded. I am thinking it must be some type of deadlock, possibly PCIe related, that prevents any further I/O from happening.

Well what I may suggest is annoying but you might want to install windows on a temp hard-drive/usb pen and do some solid testing under that. I mean it could be a hardware fault such as vrm not cooling correctly or something (you canā€™t see vrm or ram temp under linux?)

Does the crash ONLY happen with proton/wine games? do native work fine?

Also I experience hard crashes sometimes with my 1080TI pop_os setup.
I suspect Plasma has a few bugs because I see critical components like kwin or plasma processes fall over sometimes. Plasma 5.17.3 has been the MOST stable for me so far Iā€™m also running kernel 5.4.

PS. You could try gnome for a little while under wayland (is it default) and see how things go, yes I know its VERY vanilla and alien compared to windows, but might be worth it as a trial and error sort of thing. (I prefer plasma also)

Off-topic: Once AMD release a 4k card like perhaps 5800XT or better I will likely buy one as I want to get in on all the open amdgpu driver horrors, nvidia drivers are boring and really donā€™t see much progress, their nvidia-settings tool has NEVER changed, and they still have MAJOR issues with wayland (rolls eyes).

Make sure that your PCIE slot is not overclocked. Vegaā€™s really really do not like that! And make sure the PCIE slot is set to PCIE 3.0 mode if it is a PCIE 4.0 board. Vegaā€™s do not like to run in PCIE 2.0 or 4.0 mode. Try resetting your BIOS Vegaā€™s can be sensitive to certain settingsā€¦ If the card is used check the vbios against the original vbios to see if it was flashed or simply try flipping the switch to the stock vbios. If that works try flashing the performance vbios to stock. Vegas have been known to have a black screen issue. Log the cards frequency and see if it clocks down to 167mhz randomly if it does it may be time for an RMA or a return. I do not think any one has found a solution to thisā€¦

For the time being, I have given up on the Vega 56. That card will soon be tested in another machine. Iā€™ll see how it fares there. Now I am ā€¦ enjoying different problems with a different GPU. :slight_smile:

Good luckā€¦ I have recently learned that Vega has been known to have take issue with ASUS motherboards. Crashing and BSOD at random no matter what the setting in the BIOS or OS are. And in no way pertaing to its state or useā€¦ It sounded like their might have been a fix from AMD or maybe the fix was to RMA it. I am unsure. But I think they fixed itā€¦

1 Like

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.