Hmm, I bought the CPU as part of a pre-built. Would AMD issue an RMA for an OEM part? If not, Iām kind of screwed as this problem only seems to happen in Windows. And Dellās position is that if the problem is only occurring under Linux on a system designed for Windows youāre SOL.
Give the C6 workaround a try.
But also at the same time contact AMD and Dell about an RMA. Maybe even CC them
The OS really shouldnāt matter if the hardware is potentially defective. It canāt hurt to try and be persistent about it. Make mention of the Linux Ryzen SEGV bug.
You can also try and run the ryzen kill script
You shouldnāt have to do these workarounds to get usable hardware with a good Ryzen CPU.
Well, Iād try to contact AMD, but I get this when I try go to the contact webform:
emailcustomercare.amd.com uses an invalid security certificate.
The certificate is not trusted because the issuer certificate is unknown. The server might not be sending the appropriate intermediate certificates. An additional root certificate may need to be imported.
Error code: SEC_ERROR_UNKNOWN_ISSUER
LOL, thatās one way to keep your RMA costs downā¦
Just mail direct to [email protected]
OR
https://support.amd.com/en-us/contact/email-form
And you will get a ticket number back. Which a rep will respond to later at some point.
Also taking this into consideration you may need to go via Dell.
https://support.amd.com/en-us/warranty/oem
Works fine here
The site loaded in Chromium OK, but then the form timed out. But I got a confirmation email from AMD anyway. Weāll see if they have any suggestions, or offer an RMA.
I agree that this should be no problem in the first place.
But maybe give processor.max_cstate=1
a shot?
Works for my laptop.
Thanks, Iāll give it a shot. But doesnāt that kill your battery life? Have you RMAād it?
Well, it does shorten the battery life a little probably but it is still clocking down just sitting there doing nothing. I wonāt RMA, it works perfectly otherwise and on a laptop ā¦ if thatās all that is wrong with it, I take it.
In @noenkenās case its a different kernel related issue and not a hardware problem.
I also had a defective R7 1700X with the MCE issue as you do. It doesnāt just crash under linux. Under the right circumstances that also would occur in windows.
noenkens issue is related to how the linux kernel handles C-states and power management of the Raven Ridge Vega GPU and PCI-e bus when C-state changes occur.
Essentially the linux kernel developers havenāt got all the software code in place to make it work right.
Oh wow, I didnāt realize that. I thought it was also something that would be done in a BIOS update at some point. Yeah, if itās hardware then of course be loud about it, @imrazor.
MCEās almost always (99.99%) means thereās a hardware issue.
Great, I hate trying to coerce support out of companies like Dell and AMD. Since itās a consumer product and not an enterprise PC, itāll be even harder. I shouldāve gotten a workstation, but I donāt think there are any Ryzen workstations out there yet.
Iāll wait for a response from AMD, then Iāll ping Dell and see if i get any useful response. In the meantime, Iāll try disabling C6 with the zenstates.py script, then all C-states with @noenkenās kernel option if that fails.
If Iām forced to, would getting a brand new retail Ryzen gen 1. fix the issue?
EDIT: Oh, one other thing. Could this be a RAM issue? Iāve seen that mentioned in other threads.
Iām almost 99.99% sure itās not a RAM issue. If you tell me this also happens when your RAM is running at JEDEC (stock) spec then that makes the last 0.01% which makes it 100% sure.
Yep, my RAM is running at JEDEC speed (DDR4-2400). I donāt even have the option of overclocking RAM or XMP in the BIOS.
Got a prompt response from AMD. They want my serial number. Is there any way to get it without tearing apart the PC and removing the Dell cooler?
Did you get Dell paperwork? Maybe in there?
I kept the box and paperwork for about 30 days, then tossed it. I donāt think Iāll be able to tear down the PC until this weekend.
Some (not many)bios display the serial number of the CPU, might be worth checking.
Got to take the cooler off.
Thatās the only place itās usually located unless dell printed/stored it on somewhere else aswell.
It should start with 9R6
and then xxxxxxx numbers.
Seeking a little clarity here. I tried the kill-ryzen.sh script (on Ubuntu 17.04) you posted a couple of days ago to test for the Ryzen segfault bug. It ran for about an hour and a half without crashing, and normally trips within 5 minutes or so. @catsay that makes me doubt that it really is your early adopter bug. I bought this PC in Jan 2018.
Another difference between my problem and the SEGV bug is that bug occurs under heavy load, whereas my crash always occurs when the system is idle, or being prodded out of idle by a mouse click. I thought it might be @noenkenās bug, but mine is not a PCIe error, but an MCE crash.
So Iām not really sure Iāve found the root of the problem. There have been a few theories about problems with PSUs not supporting C6, but that seems far-fetched. The system also came with a single 16GB stick. I bought a āsystem compatibleā stick from Kingston (same as the OEM stick) with the same timings, but Iām starting to suspect it too.
Perhaps I should try Windows for a few days and see if I experience any crashes. If it truly is hardware, I should also see problems with Windows, which I just havenāt used much.