Ryzen crashing while idle

Yes, it seems to be fairly random. Sometimes multiple days will go by without any issues, other times it’ll crash multiple times within a few hours.

@wendell Sadly no. I tried to get a more recent psu, but the 2019 and 2020 models were either sold out or being sold at mark up. What is the design flaw?

@MisteryAngel It’s set to auto. (1.040V)

I don’t think that is a kernel’s problem but a BIOS one.
Before Agesa 1.1.9.0 I tried to disable IOMMU but it didn’t work for me.
I think (hope) that in few months everything will be fine with Ryzen 5000.
Anyway, it is not fine for a buyer to search for that kind of solution (memory timmings, voltage, IOMMU, PSU idle). I feel sometimes like a beta-tester. It had to work fine with default settings but it didin’t.
P.S.I use now A-XMP 2. I did not find information about MSI but ASUS use DOCP1 with its timmings and DOCP2 with memory timmings.

Ah that might be a tat low.
try to manually set it to either 1.1V or 1.2V, and see if that might fix your stability / crashing issues.

Yeah, I agree, it’s low. I have it at Auto and it reports 1.1V (±.05V)

1 Like

Yup that’s about right where it should be. :slight_smile:

In that case I’ll manually set it to 1.1V. If it continues to crash I’ll up it to 1.2V.

EDIT: The cpu core voltage is set to auto at 1.48V, the soc voltage is now set to override mode at 1.1V, and the dram voltage is set to auto at 1.216V. Let me know if any of these seem off.

1.2V is default for the RAM, if you’re using XMP it should be bumped to 1.35V (but it should be bumped by XMP profile itself). CPU voltage is pretty volotile even when observed in BIOS screen, but 1.4[0-9]V seems fine based on my observation.

I’ve updated BIOS to the latest beta today on both machines and seems still stable, PBO is actually working better now and bumps frequency ~100Mhz higher than before.

1 Like

For non overclocked memory that’s okay.
But like @agurenko mentioned above when using xmp it should,
be bumped to like 1.35V.

But i suppose we firstly have to wait and see how your system is now behaving.

Hey there,

Dumping this here, might be helpful.

Running Ryzen 3900x / Asrock X570 Taichi / Gskill 64GB F4-3600C16Q-64GVKC @3600 MhZ (XMP/1.35V)
Except for RAM, the system is not overclocked.

After no problems at all for about a year I had similar machine-check errors after BIOS-Update from 2.8 to 3.6 & 3.8
For troubleshooting i did some memtest (no error), voltage fiddling and boot-options - without improvement.
Mersenne Prime in stresstest resulted in a rounding error, which made me really nervous (bad cpu / mem?).

Since the hardware did not change and the system did run stable about a year, I reverted back to bios ver 3.0, which is the last version before the AGESA-updates for the Ryzen 3000XT & Ryzen 5000-series.

Since then no trouble. System is stable yet again…
Had Mersenne Prime running for a few hours without any problem. If the system stays stable for a week or so, will try to update the Bios-version step-by-step…

2 Likes

Just crashed for the first time in over a week. Soc voltage is set to 1.1V. I’ll try setting it to 1.2V and see if that does the trick.

Just crashed at 1.2V. Any ideas on where to go from here?

What kind of setup are you running right now? Stability at 1.1V sounds like it still might be the PSU. What’s your MB? What kernel are you currently running?

I’m pushing 16 days with no mce errors (except 1 out of sleep crash, but I have all reasons to believe it was a regression in 5.10.5 kernel).

I’ve updated my BIOS to 7C84v153 Beta from 7C84v151 Beta (For MSI X570 Tomahawk) - that’s latest availble AGESA for that board, currently at kernel 5.10.7 and still using Normal for PSU Idle Power (I believe this requires power off - power on cycle to apply, looking at voltages, so it’s not applied on reboot).

I’m running the 5.10.2-2 kernel. My motherboard is an MSI X470 Gaming Pro Carbon (non-ac). My psu is a Corsair RMx Series 850W (CP-9020180-NA). I tried the beta bios, but it’s too unstable for me at the moment. I tried to get my hands on a newer one, but this was the only one that wasn’t sold out or severely over priced.

I do not think that is a problem who can be resolved with voltage or another setting from BIOS. It would must working flawless with BIOS default settings.
I do not think that is a PSU problem.
My guess is that AMD have BIOS problems with Ryzen 5000. Everything will be fine but we have to wait.
X570/B550 had diferent AGESA than X470/B450.
I think they working on X570/B550 - Ryzen 5000 compatibility/BIOS issues.
After that they will fix X470/B450 - R 5000 compatibility/BIOS issues.
So, I think that you have to wait for a fix from AMD (Agesa - BIOS).

While generally you’re right, he is using a 3000 series cpu with X470 chooser, but I think the point still stands. Everyone now busy bringing 5xx series chooser to 5000 compatibility before fixing everything else. Unless PSU is faulty, I don’t think it’s the problem either, 850W with gold certification should be plenty and then some.

Btw, one thing I also did and forgot to mention, just to be sure, I’ve splitter my GPU power cables between two lines instead power both 8pins from one line. I don’t think it matters so much, but worth mentioning.

Just crashed during gameplay on Windows. Not a very demanding game, but before now it’s always only crashed while idle, so this is new.

Any clues in windows event log?

What BIOS do you use?
If you have a Ryzen 3000 try to flash BIOS 7B78v2D (Agesa 1.0.0.4). And use default settings.

1 Like

@wendell There may be, but I’m not familiar enough with Windows to know what any of it means. Since I don’t have permission to post links, I’ll post some that look important.

The application-specific permission settings do not grant Local Activation permission for the COM Server application with CLSID
{2593F8B9-4EAF-457C-B68A-50F6B8EA6B54}
and APPID
{15C20B67-12E7-4BB6-92BB-7AFF07997402}
to the user WINDY-3700X\bzpnu SID (S-1-5-21-666156641-1866131615-942766165-1001) from address LocalHost (Using LRPC) running in the application container Unavailable SID (Unavailable). This security permission can be modified using the Component Services administrative tool.

The server {FD06603A-2BDF-4BB1-B7DF-5DC68F353601} did not register with DCOM within the required timeout.

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 13

The details view of this entry contains further information.

@thorn I’m not sure what BIOS I’m on as the MSI download page is down right now. I’ll get back to you on that.

One thing I have noticed is that there are some programs that make the pc crash more than others. Qbittorrent and my vpn client, Airvpn. It may simply be a coincidence, but I thought it was worth mentioning.

Ohh the plot thickens. And you already RMA’d your cpu??

1 Like