Help Request: nvlddmkm errors

Good afternoon everyone,

I’ve been having an issue with my GPU/drivers cooperating for the past almost two months, and despite my best efforts, I can’t figure out the solution to my problem. I know this community is quite software-savvy, so after exhausting a large number of Google searches, I figured I’d try here and maybe someone would be able to help. I’m desperately trying to avoid RMAing my board or CPU as I don’t have a backup main rig for gaming and such.

I have an EVGA 2070 Super FTW3, paired with a 3900x and 32GB of Corsair Vengeance rated for 3600MHz on an ASUS X570 Prime-PRO. I built about a year ago, when Covid was starting to pop off. I’ve had zero issues until about two months ago. I currently suffer from what Windows Event Viewer notes as an nvlddmkm error. It reads as follows:

"The description for Event ID 14 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3
0d20(31cc) 00000000 00000000

The message resource is present but the message was not found in the message table"

When this error is logged to Event Viewer, it correlates to the drivers appearing to crash and recover (sort of). This occurs while playing games, watching Youtube, or even at idle sometimes. When it happens, both of my monitors will go black, blink for a moment, and recover. If I’m playing a game, it has obviously crashed the game by the point that the monitors recover. It doesn’t matter what platform I play them on, nor which game I play. I don’t believe it matters if I’m running a DX12, 11 or Vulkan title.

Before I list the steps I’ve taken to try to figure this out, I’d like to note that this has happened both on my original 2070 Super that I bought new, and the refurbished one that I recently got from EVGA after RMA’ing my card. While the 2070 was out of my rig, I put in my 970 FTW that I still have and the issue did not repeat itself once, that I noticed. The event viewer errors are, as far as I can tell, exactly the same between both the original and the refurb’d cards.

I have done the following, in no specific order (I’ve done all of this over the span of the last two months and I can’t remember in what sequence I’ve done these things):

-Wipe and reinstall of Windows 10
-Run a second dedicated PSU PCIE cord for the second 8pin, as I was originally running only one that had two 8pin ends on it (currently running two dedicated cords)
-BIOS updates, including the most recent stable release for the board as of yesterday, and the most recent BETA release as of yesterday after the stable one didn’t fix it (Beta release was the one with the USB fixes, version 3063; still applied)
-Resetting the MB to defaults
-Running the RAM under its DOCP profile, and leaving all values as they were once applied
-Running the RAM under its DOCP profile, adjusting the RAM voltage down a little bit from its 1.35V profile to 1.3
-Running the RAM at default speeds, no DOCP profile applied
-Applying the RAM’s DOCP profile, then adjusting the speed down to 3200MHz
-DDU-ing and reinstalling the drivers with a variety of versions, including a hotfix driver (I don’t recall the version number); I’ve DDU’d probably a dozen times at this point.
-Removed all overclocks and run at stock speeds
-Underclocked GPU by -50 on core, -100 on memory
-Checked GPU-Z log for voltage above 1.1V (never went above 1.06 or so; saw this on a reddit thread from a user who was having an issue with his 3xxx card, figured I’d try that)
-Run the Windows memory diagnostics, which I had to cancel before it completed but it had not detected any errors before pass two, 60% completion
-Ran Precision X1 and Afterburner (not at the same time; however, when Precision was being used, I noticed that Windows would send the “device disconnected” sound frequently upon initial login, even though I’d made no changes or hadn’t unplugged any devices)
-Ensured GPU was properly seated, and supported by the anti-sag bracket I have installed
-Ensured both 8pin connectors were completely seated

I’m sure there’s more that I’ve tried, but that’s all I can remember at the moment.

I did see a thread on the EVGA forum that detailed a similar, if not exact issue that people were having, and a few users said they have stopped updating their drivers past 45x.x. 46x.x appears to have been giving them problems. I haven’t tried using an 45x.x driver yet, but that’ll be my next step after I post this.

At this point, I’m not sure what else to do aside from using an older driver. If anybody has any knowledge about this issue and can lend a hand, I would be forever grateful.

Thank you for your time,
-Sumi

–Edit: I can provide my GPU-Z logs if requested.
–Edit 2: When this rig was built, I used all new parts except for some drives. Those recycled drives are used only for game storage. My OS NVME and mass storage HDD were brand new. CPU, MB, PSU, GPU, RAM, all new.
–Edit 3: I have also tried running with and without GSync enabled. I never run DSR.
–Edit 4: Confirming the issue happens “at idle”, at 634p on 4/1. Driver just crashed again while not playing games or watching videos. Same error shows up in the Event Viewer.
–Edit 5: I’ve installed driver 445.87 after DDU-ing again at 645p, 4/1. Will report back with findings.
–Edit 6: 445.87 doesn’t appear to fix the problem. Still crashes. However, the game will hang and I have to hard shutdown my rig. The monitors don’t go black and the game doesn’t outright close like it previously did.

Update, 4/4: I’ve begun playing my games without afterburner or precision open, and it appears to (sometimes) lengthen the time between crashes, but they still happen. Gpu-Z doesn’t report that the GPU is pulling excessive wattage from the 8pins, nor the PCIE slot.

Edit, reword: GPUZ reports 8pins and PCIE slot are pulling within acceptable wattage limits.

Update, 2p, 4/4: a friend recommended installing linux for debugging purposes. I’m going to do so tonight and see what happens. Will report back with findings.

have you tried enabling disabling csm (compatibility support module) in eufi?
it improves vga compatibility (apparently).
also make sure HPET (high precision event timer) is enabled if running win 7 or 10. (if disabled it can cause your gfx to crash as well as other random bsods)

I have not tried either of those. I’m on my way home now and will give both a go. Thanks for the recommendations!

Here’s where I’m at now.

CSM was disabled, so I enabled it (it being disabled wasn’t the solution so, I tried the opposite). I also confirmed that HPET was enabled. I didn’t see anything for it in my BIOS, so I went to the Windows level and made sure it was enabled via command and by device manager. I also checked for a driver update via device manager for it, and it was good.

Usng an old GPU driver didn’t fix it, so I updated to current via geforce experience.

Played a game for an hour or so, and it didn’t crash. Closed the game and afterburner. Installed precision, because I want control over my third GPU fan. As soon as it launched, the screens went black and I heard the “device disconnected” sound.

A few minutes later, while watching YouTube and eating dinner, the driver crashed and recovered (screens went black again).

I’m about to wipe the OS drive, install linux by itself and run through some stuff. I’m hoping that it will be able to pick something up.

If you or anybody has any further suggestions, I’m all ears.

Edit: I don’t believe precision is the issue, as I’ve had problems with both precision and afterburner, and with both closed.

Update: linux appeared to run fine for the hour or two I played SOTR. Reinstalled windows, ran fine for 1.5 days until now.

However.

I believe I saw my monitors blink when I inserted my thumb drive into my cheapo amazon usb-c hub. Disconnecting it now and trying again. If that doesn’t fix it, I’m going to be moving my rig from UPS power to the wall. My UPS doesn’t keep things alive when the power goes out (I need to warranty it) anymore. I continue to use it to “clean” the power out. Not sure if it’s actually doing that but it sounded good to me.

Could it be a conflict between afterburner and precision at the same time?

Negative. I apologize if I worded my previous posts to imply I ran them both at the same time, but I didn’t do so.

I’m to the point now where I’ve installed ubuntu and am trying to get everything set up, including gaming as best as I can. I’ve basically given up on making windows work, pending someone having a miracle idea.

Sounds like a limited power draw issue to me. Check the PSU with a multimeter or swap in a different one to see if it keeps crashing under load. it’s also good to see if all the cables are tight and the connections fully in, especially if you have a modular PSU.

I’ll definitely give this a try on my next day off (Wednesday). I had thought about it maybe being a PSU problem but I was hoping it wouldn’t be.

I had the same symptoms on my PC a couple years ago and it turned out to be a loose power supply AC plug on the outside of the computer. Shoved it in tighter and my overclocks were suddenly stable again. Maybe look there first :eyes::eyes::eyes: