Constant Driver Timeouts on 7800 XT in World of Warcraft


Following up this comment I left on the GPU’s relevant video on Level1Techs.

My specs are as follows:

Ryzen 7 7800X3D
Sapphire Nitro+ 7800 XT
Pure Power 12 M 850W
Gigabyte B650 Aorus Elite AX rev1.2
G.Skill Flare X5 2x16GB 6000MT/s CL36 (F5-6000J3636F16GX2-FX5)
Samsung 990 Pro 1TB
Kingston KC3000 2TB
bequiet! Pure Loop 2 360mm
bequiet! Pure Wings 3 120/140mm fans
Fractal Design North
ASUS TUF VG27AQA1A VA 170Hz

The issue is most prominent when using DirectX 12 as graphics API. After a TDR has occured, it seems like the graphics card limits itself to half its power draw - normally ~270W, drops down to 100-125W. Sometimes after a TDR, the cursor will not go invisible when adjusting the camera by holding down right click - as though there’s some kind of overlay on the screen. The issue does persist in DX11 mode too, but happens much much more rarely instead of happening anywhere, everywhere and every 5 steps you take. It’s also made worse by turning up the graphics and MSAA, although turning post-processing off and using VRS does not help either.

Throughout many, many searches in Google, browsing AMD’s & Blizzard’s forums and Reddit posts, I’ve tried a lot of things. Disabling ULPS & enabling “Erase autosaved startup settings” using MSI Afterburner, Disabling MPO, Undervolting GPU, undoing Overclocking on the GPU and CPU, disabling hardware acceleration on every app, disabling HAGS in W11, turning off fast boot both in W11 and BIOS, turning off EXPO and PBO Enhancements in BIOS, disabling FreeSync, changing the monitor’s DP cable with another HDMI 8K cable from UGREEN, reseating the GPU and the RAM, using DDU or AMD Cleanup Utility to reinstall drivers, same with AMD Chipset Drivers, using older versions of drivers, using various versions of BIOS stable or beta ones, setting Power Plan to Full Performance, fiddling with some disk caching windows settings regarding my SSDs, resetting Shader Cache and performing Scan & Repair, letting WoW on DX12 do its 5-minute thing before loading for the first time, changing the GPU BIOS mode via the TRIXX app Sapphire provides, setting a TdrValue of 60 in registry, turning Core Isolation off, Turning Kernel-mode Hardware-enforced Stack Protection on (this is the first time I got a TDR, I think) and then off, multiple fresh installs of Windows 11, sometimes avoiding installing Gigabyte Control Center, disabling all addons in WoW, lower refresh rates on my monitor, lower FPS cap in-game, lower graphics settings (which does alleviate the issue just a bit), setting WoW in High Performance mode and disabling windowed optimizations for windowed games (Settings > System > Display > Graphics), changing all or any of the following settings in System > Display > Graphics > Default Graphics Settings: HAGS, Variable Refresh Rate, Optimizations for Windowed Games, disabling Discord’s overlay, Disabling Adrenalin’s overlay and hotkeys, enabling GPU scaling, On or Off: Tesselation Mode, Enhanced Sync, Anti-Lag, Chill, Morphological AA, Anisotropic Filtering, Surface Format Optimization, OpenGL Triple Buffering…

If Discord has hardware acceleration enabled, it will crash along with WoW. If not, for some time my microphone will not work - although I can hear everyone else, but will it will not die. This is not the issue for any other apps, although video in YouTube does get killed.

Tried playing a bit of Witcher 3, about an hour. Tried with Ultra High settings, Ultra RT settings, Low settings, didn’t crash. I haven’t gotten around to testing any other DX12 titles.

Worth mentioning that intially I did install the 2 side fans on top of the GPU in the North case and the likely did press on the power cables of the GPU a bit. Is it possible the GPU cables have been damaged? Although I did change the cables and plugged the two middle ones from two separate cables instead of the two outter ones.

I have not yet tried installing Windows 10, mainly because people with similar issues have mentioned them on Windows 10 too. I have also not tried using the physical switch on the GPU itself, which might lock the card into one BIOS or another and keep it from getting adjusted in software.

There might have been some user error when installing the cooler on the CPU, since I couldn’t find the hole for the second screw and I kept swinging left and right until I figured I should loosen the other screw.

So far, I have not observed any extreme temperatures and both GPU & CPU behave normally and within their specified specs.

I’m really at my wit’s end here. I’ve been banging my head against this issue for so long. Some people do mention they’ve been playing without any issues at all, which makes me think it’s user error, but throughout all my trials and errors I’ve been convinced it’s either a bad GPU or a software issue (AMD, Blizzard or Microsoft? Don’t know, nobody really responds to any threads with a decicive answer). I could RMA the GPU but I’m not very convinced I’m going to get refunded and even if I do, I’d rather not go for an entire month without a GPU. It’s just really depressing, having spent so much on this PC, being so excited about finally getting to build it and having such awful issues.

Any ideas?

1 Like

Missed this! One sec

Can you trigger a timeout or get one then zip and post the windows system event log to look at?

Also download hwinfo64 and run the recorder thing on it and post the csv file. Stop the recording right after the driver times out pls.

Hwinfo64 is awesome.

WoWTDR.zip (166.2 KB)

I’m not sure if I’ve exported the right files from the Event Viewer, so if they’re wrong, please elaborate where I should navigate.

I went to these as they seemed most relevant.

Anyway, it’s very easy to reproduce a TDR and it happens pretty much everywhere and anywhere, so I can provide you with anything you might need.

Nevermind, induced a crash again elsewhere. Seems like these files are more relevant.

AMDCrash2.zip (60.4 KB)

Let me thank you so much for taking the time to bother with this issue <3

Yeah this is the classic tdr issue, which is almost always hardware.

Can you ddu and install 24.1.1 but opt not to install the adrenaline gui?

Also run hwinfo64 and leave it logging then stop it after a tdr and post the logs. I’m looking for the telltale power drop right before the tdr recovery message in windows event viewer.

I was going for pcie errors.

In bios do you have resizable bar turned on? If not turn it on.

Also try setting your bus clock to 99 instead of 100. What motherboard is it again?

Before doing all this today, I used the AMD Cleanup Utility to remove everything and fresh installed AMD Chipset Drivers & GPU drivers, full install. I’ll attempt to do this again with DDU and opt to install Driver Only now.

My MB is the Gigabyte B650 Aorus Elite AX rev1.2, running on the FA2 BIOS

I do have Resizeable BAR enabled in BIOS.

I’ll try looking for the Bus Clock setting, I assume it’s in BIOS, but I’m not very familiar with that stuff.

Hwinfo64 I kept running until the crash happens and stopped it as soon as I could but it’s impossible to stop it before the recovery message shows up because my PC is frozen and only becomes responsive when the recovery message shows up.

Anything else we should try before I come back with data?

Can’t quite find anything named “Bus Speed” but I assume it’s this setting? Which won’t let me set it any lower than 100.

setting that to 100.00 is probably a good start, this is what I had in mind. sometimes it lets you set down to 99.something

spread spectrum makes it “wobble” between 99 and 101mhz which reduces RF interference. But 100.00 (with spread spectrum off/disabled) can help us if its the “wobble” making it unstable.

1 Like

hwinfo64: thats fine just stop it as quick as you can, I missed the csv in your zip, see it now, thanks

so I see some interesting stuff, about how far back from the end of the hwinfo64 error is it?

the +12v rail actually looks pretty okayish, really, it doesn’t fluctuate outside the norm really

if you see the gpu maximum power drop there if that correspoinds to the shut down, looks like the gpu asic power spikes way over on a ramp
could be that the gpu died before then when it starts go do down utilization % wise

use hwinfo64 and this is the hwinfo 64 grapher tool to do this kind of thing to get a visualization of what you can see in the sensors when it craps out.

cpu utilization goes way down too then back up again.

gpu max power wattage (this is igpu+add-in) also tanks hard there toward the end.

Do you use the igpu? you might try disabling it. The asic power spike above 50 given it stays really close to 50 all befoe that is the only thing that immediately jumps out, but this would take more data and digging.

so try disabling the igpu
and adrenaline 24.1.1 but not installing the adrenaline gui (after ddu ofc)
and 100.0mhz
and spread spectrum off

and see if that still triggers a crash.

is vsync on or off in WoW?

1 Like

The second zip file is extremely short I think? Doesn’t take me long to induce a crash. Especially when turning graphics all the way up, enabling CMAA 2 and MSAA x8

Vsync through the in-game settings is pretty inconsistent. Sometimes it causes odd frame pacing, sometimes it works fine. Turning it on when it works properly vs turning it off I can’t quite tell the difference though.

Spread Spectrum disappears as an option as soon as I set a fixed clock.

I have tried disabling the iGPU in BIOS but it turns itself back on. I have also tried disabling it through the Device Manager in Windows, but I still got crashes. Will do it again anyway.

Worth mentioning that I’m now seeing some odd behavior when logging into Windows 11, at the part where you have to type in a password, it’s not loading any background images - it keeps a black background and after entering my password, it seems to get stuck for a bit before it gets into the desktop.

Side question… Is this normal?! I had a i5 8400 & GTX 1060 6GB for nearly 7 years before making this rig and I never had to fiddle with such settings in BIOS.

Also on the side, maybe it’s programs like MSI Afterburner and Gigabyte Control Center messing with something? Although when I first made the rig, I did install GCC and I did let it install all the stuff it wanted to. For about one week or so I was crash free - albeit my system did shut down twice, completely out of nowhere, once while playing and once while… opening Chrome. Weather was pretty wonky at the time so maybe there was a surge and my PSU was damaged?

This part I don’t fully understand. :frowning:

Adding here that it probably is the drop at 00:02:50. The drops at 00:00:20 and 00:01:00 are probably loading screens since I was moving between areas. Will try to keep it clean from loading screens next.

Try uninstalling afterburner and then ddu and redo adr aline 24.1.1

No this is way not normal but the steps might help us narrow it down.

It seems like you might have hardware issues but it’s not clear to me if it’s mobo, GPU or power supply. Could be anything.

Right now I’m learning toward GPU and then possibly mobo

Id be tempted to swap GPUs and see if that changes anything if you know anyone else with an AMD GPU. Sapphire support is usually really really good, too

Just got back. I’ll be using Revo to uninstall both Afterburner and GCC.

Then I’m doing DDU and fresh installs of drivers.

I might be able to grab an NVIDIA GPU pretty soon as a friend is upgrading.

Reiterating that the crashes seem to get exacerbated when HAGS is enabled in Windows & Hardware Acceleration is enabled in other apps (Chrome, Edge, Spotify, Discord, Battle.net)

By the way, I’ve set the GPU BIOS to the right-most setting with the physical switch on the GPU. (It’s the Nitro+ implementation!)

I think it might still be likely that the Motherboard is the problem too, seeing as my WiFi receiver craps out whenever I open qbittorrent. It gets back up but for a minute or so, it won’t load any sites or anything. WiFi connection on my phone works fine all long though. And that’s an issue I didn’t have with my previous PC, which I stuck a Gigabyte GC-WBAX210 card into late into its lifecycle. My receiver also doesn’t get recognized when I do a Windows fresh install too, which I find odd and which wasn’t the case with my old rig.

Sapphire support I did contact in the past, although I don’t think I provided a lot of info. Their response wasn’t very helpful and they did not respond to my e-mail afterwards.

Might have to try this at some point too.

Anyway, getting back to you with results pretty soon!

Test3.zip (155.4 KB)

Did all the stuff, couldn’t quite get it to crash. Here’s a clean HWinfo64 log.

Will now try going into a dungeon that was 100% guaranteed to crash the game, see if I can survive there too.

Test4.zip (281.5 KB)

Full dungeon run completed, no crash… to my surprise. Pretty much every other thing I thought could be the culprit (HAGS, Hardware acceleration in apps, livestreaming on Discord, Core Isolation) ALL enabled.

Should I try going back to allowing Bus Clock to Auto, see if it was any kind of driver issue?

Oddly, it seems my in-game latency has dropped 4-5ms too. Idk, maybe it was a mobo issue afterall?

Nope. I got a crash soon after getting into another dungeon but I forgot to start recording. It’s overall much, much more stable though. One a single crash, in DX12 in the span of 1 hour? I couldn’t even take a step before.

TdrTest6PlusMissed.zip (48.7 KB)

Sending now the files from one crash I got in the main hub, with logs from both event viewer + hwinfo64, plus the event viewer logs from the crash I forgot to record.

Test8.zip (70.8 KB)

Another one! I think I’m going to stop now and let you process it all.

Dear Venedictos!

I’m facing the same issue.
I heard something about the graphic slider, if you turn it up to 10 it’s enable some ray-tracing option even if you have it turned off.
I’m testing it now, will keep you updated.

Test10.zip (207.5 KB)

I think that’s enough for now. :grin: