Help with very odd graphics issue. Not fixed by new card

A fairly major issue I’ve been having over the past few weeks:

I purchased a prebuilt with the following specs, it arrived in january:

Gigabyte B450M DS3H-CF

AMD Ryzen 5 2600

Adata XPG Spectrix D60G 16GB (2x 8GB) 3200MHz RAM

Kingston A2000 500GB M.2-2280 NVMe PCIe SSD

XFX Radeon RX 580 GTS XXX 8GB Graphics Card

Be Quiet! System Power 9 600W 80+ Bronze PSU (Not modular)

I also fitted:

1TB Seagate Barracuda HDD

240GB WD Green SSD

My main OS is windows, but I have a linux mint installation on the WD SSD.

The issues started appearing just over a month ago, it would crash when opening any kind of graphically intensive program. (ETS2, synthetic benchmarks) and occasionally crash in normal operation as well. A few times the screen froze, the last sound would continue to play from the speakers (usually buzzing as well), then the computer would restart after the screen went black. Most of the time though, the screen just goes black, and shortly after the cpu cooler makes a short sound indicating that it is restarting.
In the event logger, a critical “Kernel-Power” error is seen (41), shortly before there is a kernel power info (172) with bugcheck 0x116 (VIDEO_TDR_FAILURE). I have opened a MEMORY.DMP with the windows debugger, it shows atikmpag.sys to be the cause of the crash.

Importantly, after this crash starts hapenning, they also happen in linux mint and event memtest86 booted from a flash drive.

The temporary solution is uninstalling the drivers in windows using DDU then reinstalling, with factory reset enabled. Not using factory reset does not fix it. After this, I usually have a day or so before the issues start to appear again, on all systems. Over time, this time between reinstalling and crashes gets shorter. So far, I think I’ve worked out that the issues only start to appear after a shutdown or sleep.

I’ve ran memtest86 for ~45 minutes while the computer was not crashing, with no issue. I’ve also run burnintest, furmark, occt and several games when the computer is not crashing, with no apparent loss in performance. I have also tried individual RAM sticks, the crashes continue to happen. Reseating the graphics card, enabling and disabling XMP profile 1 in the BIOS seems to have no effect.

As I have warranty support for this pc, I contacted them about this and described the issues. They sent out a new GPU, which worked correctly for about a week before the crashes returned with increasing frequency.

I have read on several forum posts that XFX factory overclocks their cards but does not change the max power limit to match, suggesting increasing the max power limit and changing clock speeds. I have tried decreasing memory clock speed and main clock speed and increasing power limit in the AMD adrenaline section, crashes keep hapenning. Additionally, the settings do not seem to save after a restart. However, a quick browse on the web shows that its not uncommon.

Something to also not which may be unconnected, input is lost briefly when I plug and unplug certain devices from the mains. These tend to be speakers, I’ve read that its quite likely that cheap HDMI cables that aren’t properly shielded suffer from interference such as this.

My thoughts:

Drivers seem to be poking the hardware somehow, I initially thought that it reflashed the vbios but that is not correct. That would mean that the hardware is somehow resetting/corrupting whatever was poked by the driver reinstall.

A malfunctioning component of the system is damaging the GPU, in my opinion this would be the power supply.

Any thoughts or further diagnostics would be welcome, I would like to try and get all bases covered to try and work out what the hell is going on before the whole unit is sent to the warranty company.

Many thanks,
GoldSloth

The VIDEO_TDR_FAILURE bug check has a value of 0x00000116. This indicates that an attempt to reset the display driver and recover from a timeout failed.

atikmpag.sys - Welcome to the world of AMD reset bugs.

Not long enough, leave it overnight and be sure to enable multicore testing.

1 Like

The PCIe slot might be dodgy. Have you tried dropping the card one slot.

With random shut downs, a failing power supply is another consideration.

Thank you for your replies

I’ll try running memtest overnight, hopefully it won’t crash during that time!

Unfortunately I can’t run the GPU in the other slot, the case basement gets in the way, as does SATA connectors e.t.c and I don’t have any PCI risers.

It could be the PSU frying them, that’s a possibility. If so, is there any way I can check this out?

1 Like

Knock on wood I haven’t had these kinds of issues for years :smiley:

I’d try: another PSU, another motherboard, another processor (since it’s got all the controllers built in, it’s rare but processors do come faulty :confused: ) and maybe the RAM, in that order…

Unless someone has had the same exact experience and figured out the problem, it’s about the only way to get to the component at fault… Unless you return everything and get a completely new machine

I had similar weird GPU failures with a Ryzen 3900, Ubuntu Linux and a Vega 56 when I was running RAM at 3,600 MHz. Dropping it to 2,400 made all the problems go away.

On that system Memtest never found any problems but it would still consistently fail with memory errors when doing 24 thread C++ or Rust compile jobs.

I didn’t get around to trying it but others told me that going to manual settings on RAM worked better than using XMP/DOCP.

I’m just using ECC RAM at 2,666 MHz at the moment.

Thank you zlynx and jotm for your replies,
I’ll try changing RAM settings a bit, If that doesn’t work then I’ll try and send it off to the warranty repair company.

Got yet another replacement. A few months ago, so far so good.
It seems that these issues are very common, from what the warranty guys said.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.