Ryzen 7 1700 machine check error - instant reboot

Hmm, I bought the CPU as part of a pre-built. Would AMD issue an RMA for an OEM part? If not, Iā€™m kind of screwed as this problem only seems to happen in Windows. And Dellā€™s position is that if the problem is only occurring under Linux on a system designed for Windows youā€™re SOL.

Give the C6 workaround a try.
But also at the same time contact AMD and Dell about an RMA. Maybe even CC them :joy:

The OS really shouldnā€™t matter if the hardware is potentially defective. It canā€™t hurt to try and be persistent about it. Make mention of the Linux Ryzen SEGV bug.

You can also try and run the ryzen kill script

You shouldnā€™t have to do these workarounds to get usable hardware with a good Ryzen CPU.

Well, Iā€™d try to contact AMD, but I get this when I try go to the contact webform:

emailcustomercare.amd.com uses an invalid security certificate.

The certificate is not trusted because the issuer certificate is unknown. The server might not be sending the appropriate intermediate certificates. An additional root certificate may need to be imported.

Error code: SEC_ERROR_UNKNOWN_ISSUER

LOL, thatā€™s one way to keep your RMA costs downā€¦

Just mail direct to [email protected]

OR

https://support.amd.com/en-us/contact/email-form

And you will get a ticket number back. Which a rep will respond to later at some point.

Also taking this into consideration you may need to go via Dell.
https://support.amd.com/en-us/warranty/oem

Works fine here

The site loaded in Chromium OK, but then the form timed out. But I got a confirmation email from AMD anyway. Weā€™ll see if they have any suggestions, or offer an RMA.

I agree that this should be no problem in the first place.
But maybe give processor.max_cstate=1 a shot?
Works for my laptop. :wink:

Thanks, Iā€™ll give it a shot. But doesnā€™t that kill your battery life? Have you RMAā€™d it?

Well, it does shorten the battery life a little probably but it is still clocking down just sitting there doing nothing. I wonā€™t RMA, it works perfectly otherwise and on a laptop ā€¦ if thatā€™s all that is wrong with it, I take it. :wink:

In @noenkenā€™s case its a different kernel related issue and not a hardware problem.

I also had a defective R7 1700X with the MCE issue as you do. It doesnā€™t just crash under linux. Under the right circumstances that also would occur in windows.

noenkens issue is related to how the linux kernel handles C-states and power management of the Raven Ridge Vega GPU and PCI-e bus when C-state changes occur.

Essentially the linux kernel developers havenā€™t got all the software code in place to make it work right.

1 Like

Oh wow, I didnā€™t realize that. I thought it was also something that would be done in a BIOS update at some point. Yeah, if itā€™s hardware then of course be loud about it, @imrazor.

MCEā€™s almost always (99.99%) means thereā€™s a hardware issue.

1 Like

Great, I hate trying to coerce support out of companies like Dell and AMD. Since itā€™s a consumer product and not an enterprise PC, itā€™ll be even harder. I shouldā€™ve gotten a workstation, but I donā€™t think there are any Ryzen workstations out there yet.

Iā€™ll wait for a response from AMD, then Iā€™ll ping Dell and see if i get any useful response. In the meantime, Iā€™ll try disabling C6 with the zenstates.py script, then all C-states with @noenkenā€™s kernel option if that fails.

If Iā€™m forced to, would getting a brand new retail Ryzen gen 1. fix the issue?

EDIT: Oh, one other thing. Could this be a RAM issue? Iā€™ve seen that mentioned in other threads.

Iā€™m almost 99.99% sure itā€™s not a RAM issue. If you tell me this also happens when your RAM is running at JEDEC (stock) spec then that makes the last 0.01% which makes it 100% sure. :wink:

Yep, my RAM is running at JEDEC speed (DDR4-2400). I donā€™t even have the option of overclocking RAM or XMP in the BIOS.

Got a prompt response from AMD. They want my serial number. Is there any way to get it without tearing apart the PC and removing the Dell cooler?

Did you get Dell paperwork? Maybe in there?

I kept the box and paperwork for about 30 days, then tossed it. I donā€™t think Iā€™ll be able to tear down the PC until this weekend.

Some (not many)bios display the serial number of the CPU, might be worth checking.

Got to take the cooler off.
Thatā€™s the only place itā€™s usually located unless dell printed/stored it on somewhere else aswell.

https://imgur.com/bwpKVrr

It should start with 9R6 and then xxxxxxx numbers.

Seeking a little clarity here. I tried the kill-ryzen.sh script (on Ubuntu 17.04) you posted a couple of days ago to test for the Ryzen segfault bug. It ran for about an hour and a half without crashing, and normally trips within 5 minutes or so. @catsay that makes me doubt that it really is your early adopter bug. I bought this PC in Jan 2018.

Another difference between my problem and the SEGV bug is that bug occurs under heavy load, whereas my crash always occurs when the system is idle, or being prodded out of idle by a mouse click. I thought it might be @noenkenā€™s bug, but mine is not a PCIe error, but an MCE crash.

So Iā€™m not really sure Iā€™ve found the root of the problem. There have been a few theories about problems with PSUs not supporting C6, but that seems far-fetched. The system also came with a single 16GB stick. I bought a ā€œsystem compatibleā€ stick from Kingston (same as the OEM stick) with the same timings, but Iā€™m starting to suspect it too.

Perhaps I should try Windows for a few days and see if I experience any crashes. If it truly is hardware, I should also see problems with Windows, which I just havenā€™t used much.