Return to Level1Techs.com

Slows/shudders then freezes on Threadripper 2970wx with no BSOD

This has been going on since I bought the 2970wx at the start of this year. Windows 10 randomly, even with light use like surfing the web the system randomly slows/shudders then freezes (videos become like a slide show) and within 30 seconds I can move the mouse but can’t interact with anything. Sometimes I can press the start>restart other times I have to do a hard reset. I once left it in this slow state for an hour to see what would happen, but Windows didn’t BSOD. This problem could happen 5 times a day or once in 2 months. I use a program called CAESAR-Lisflood. A simulation takes 5 days to complete but when this freeze occurs. I have to restart the PC and start from scratch.

Lately I have been running the same simulation in both my Threadripper system and my old 6900k so if Threadripper freezes the slower 6900k can complete.

My Specs;
Threadripper 2970wx
MSI x399 MEG
2xGskill F4 3000 C14D 32GTZR
GTX 1080Ti
Samsung 970 PRO 1 TB NVMe
Windows 10 Pro

I’ve tried so far;

  1. Temps are OK when stressed - CPU: 64c /MOS: 46c /NVME: 39c
  2. Upgrading Windows to the latest version
  3. Reinstalling Windows
  4. Resetting the bios
  5. Updating to latest bios
  6. Switched ram kits – I used with my 6900k never any problems
  7. Switched PSU - used with my 6900k
  8. Switched Graphics cards
  9. Switched the NVME drive- used with my 6900k
  10. Disabling Overclocking> IMMOU
  11. Changed Power Options> PCI Express>Link State power Management> Setting: “Off”

I’m out of ideas on what it could be. I would be grateful for any suggestions or recommendations.

I heard using two monitoring programs can cause this, and it’s a microcode problem with the Ryzen architecture. I’m really tempted to sell the CPU and motherboard and buy the upcoming i9-10980XE

*Sorry for any grammar mistakes

Hmm, maybe seeing this on my 2950x. It’s started maybe July, but I’ve been away for a few months, so noticing it more now.

Does your system also not want to shutdown or reboot after running for a few days?

Like clicking on Start>Shutdown button and nothing happens or getting stuck on the shut down screen with the circle of circles?

Yes, this exactly.

If I do a cold boot, into windows, and then tell it shut down or restart, it will do it fine. however if I let the system run for a few days, and try the same, it fails.

Sorry, I have no solutions, I’m experiencing the exact same problems.

My system is a 2950x on a Gigabyte x399 Designare Ex. 64GB of Corsair Vengeance Pro ram 4x16 kit, DDR4-3200 running at 3200 profile XMP 1. GPU is an ASUS 1080. Corsair RM850 power supply. Also an Intel x520-da2 10gig network card is installed.

I’d do a Memtest86 run on your RAM. Running BOINC on GPU and CPU 24/7 I would have random reboots and issues like yours for the first month or so. One of my 4 DIMMS was failing one test after 4 or 5 passes.
I have a 1950X and Designare.
If for no other reason, then to eliminate the RAM as a possibility.

You could try to re-seat the cpu.
Threadripper in general has some issues with proper seating of that large cpu in the socket.
Depending on which particular socket your boards have Lotus or foxconn,
this could be the cullprit that´s causing your particular weird behaviours.

Properly re-seating the cpu might fix it.

Another thing i could possibly think is a dpc latency issue.
There are some tools for monitoring this.
But i forgot about the names. ( i believe latency mon) or something.
This could be caused by a certain driver or hardware.

Thanks for the suggestions.

I ran Latencymon and got an error (Screenshot: https://imgur.com/xJe32pr )

I’ll resit CPU then run memtest86 through the night.

I’ve not had any random reboots or a sudden hard lockup. The system will slow but I can still open and work within programs. Roughly 30 seconds later, I’ll side click then there is a delay, a faded options menu will show up. I can only move my mouse but can’t interact or do anything. It will just stay like that (I don’t get the washed-out screen to say it’s not responding). Even if I leave it for an hour the mouse icon will still move about and it will be in the same state.

This really sounds like a dpc latency issue to me.
You might wanne check out if there are any new drivers available for your system.
And then update those, this includes motherboard specific drivers as well,
like chipset, usb drivers etc.

Whatever the cause of this problem, it seems to be getting worse on my system.

I checked and I’m running the latest AMD chipset package, and bios versions already.

Strongly considering abandoning Windows on this machine and delegating it to a Linux server role, which is currently filled by a R7 2700. See if the R7 does any better.

Ran LatencyMon myself:

Just a test. Don’t open any hardware monitoring software, Control panel or Task Manager. Just open GeForce experience and press ‘Check for drivers’ and open the Nvidia control panel. Leave your computer alone a few hours to see what it does.

Ok I’ll do that before leaving for dinner.

Did not notice anything different. Leaving the system running overnight, it was mostly non responsive the next day. Couldn’t open the start menu, and other apps were taking forever to open.

Couldn’t do a clean shutdown either, it hung on the shutdown screen, circles in an infinite loop, hdd light flickering now and then.

I’m not keen to reseat the CPU, I have a crap socket, it was a P I T A to get the screws to “catch”.

This motherboard an CPU is going into my workshop as a file server and will use an R7 as my daily driver.

Hopefully the issue is some sort of Windows <> Driver conflict, and not purely a hardware fault.

That’s the same what happened to me.

Uninstall GeForce experience. Go to device manager>Display adaptors> side click and update driver from there and restart computer. Leave your computer alone for a few hours. It hasn’t crashed so far for me.

1 Like

I don’t have Experience installed, only the control panel app. I don’t suspect it was the nvidia software causing problems, but I guess it could have been lurking in the back ground.

The machine is half torn apart already, preparing to transfer it to a server chassis.

One thing you could try to do is running Linux from a usb drive.
Likely a live session would be fine for testing purposes.
If the system shows similar behaviour on linux as well.
The you can certainally assume that it is a hardware issue.

But if the issue isn´t happening on linux.
Then it could either be a windows related drivers / software issue.
Or a setting issue in regards to windows power management options.
Or it could be an issue with your said ssd or hdd.

Also it could also be a memory leak.