New hardware = new problems: Another Ryzen problems thread

I just helped my brother build a new Ryzen PC.

Inside:
CPU: R5-1600 (Not overclocked)
MOBO:MSI B350M Mortar Arctic (Latest bios 7A37vA4)
GPU: Sapphire R9-380 4GB (Not overclocked)
RAM: 8GB Corsair 2133MHz ValueRAM
PSU: 750W (Can't remember brand, but it's a good one)
SSD: 500GB Samsung 850EVO

Unfortunately it crashes fairly regularly; about 4-5 times a day; now only a couple of times a week, usually while gaming but sometimes while web browsing and other tasks. I've disabled SMT which apparently makes it crash less Enabled SMT, doesn't seem to be causing issues but still crashes even while using Fedora 26 Solus with the 4.11 $currunt-lts kernel. The computer locks up completely; no black screen or errors, just frozen and has to be force rebooted.

Windows 10 seems to work fine (For a few minutes), ran timespy benchmark without problem.

Possible issues:
Bad UEFI options? - I've heard things like VTd can cause problems. - VTd enabled, no issues
Overheating? - Used stock fan with pre-applied thermal paste
Bad UEFI? - Heard something about the FMA3 instruction being a bit problematic. - Latest UEFI version, no reported issues with this mobo using Linux
Bad hardware? - Should try extended periods of gaming on Windows to see if it's a Linux thing or not.

Thanks for any help.

1 Like

Guessing you forgot mobo standoffs? 100% shot in the dark, quick test is to smack the case and see if it reboots.

EDIT:
NVM freezes doesnt just reboot.

Edit2:
Overheating is simple watch your hardware temps see if it seems to be spiking. Probably would start there was its the easiest thing to test. Is there a error leds on the mobo?

Then your problem is 100% software based.

yeah , you should.

Which Linux GPU driver are you using?

AMDGPU. Not AMDGPU-PRO

I've asked him to install and run his games on Windows. Let's see how that goes.

Try Manjaro, so far I had one freeze and one reboot in over two months of constant use.
You could also try to raise voltage or turn off power management stuff in UEFI.

My Ryzen 1800x did that for a while, running Slackware64-current. I found it would hard-lock within a few minutes of just opening opening firefox, with no pages loaded.

I've upgraded to kernel 4.11.3, updated the UEFI, updated various other Linux packages, and perhaps crucially I copied /usr/share/X11/xorg.conf.d/10-amdgpu.conf to /etc/X11/xorg.conf.d and removed the "10-radeon.conf" file from /etc/X11/xorg.conf.d.

I've not had a hard lockup since. Fingers crossed!

I'll give that configuration stuff a try. Probably mess with the UEFI when my brother's out. Single user mode to the rescue!

Have you checked the logs?

Which logs?

If you are on Fedora 26 go to > Utilities > Logs

Easy peasy. I'll get onto that.

While you are at it, do dnf update just incase

Have you updated to the latest UEFI available from your motherboard's manufacturer? Those have brought a great deal of stability and performance improvements.

1 Like

Firefox changed a bunch of stuff recently. Not surprising.

Did that pretty soon after getting it up and running.

OK, I think I may have found the issue. It may be overheating. I'm using the stock cooler with the stock thermal paste with an R9-380 in an old OEM case with only one small fan and tiny vents at the front.

Picture:

The most recent crash rebooted it to the UEFI which showed the CPU was at 61 degrees Celsius and not falling much at all (Went down to 59 after 5min in the UEFI). He attempted to reboot into the OS but it then it crashed within minutes of use.

Crashes usually happened as follows:
1. Screen freeze
2. Screen turns black with last second of sound played repeatedly for a few seconds
3. Sound stops and screen turns blue and prompts user for input (Lost signal)
4. PC runs for a few more seconds and reboots to UEFI
5. Further reboots boot automatically into the UEFI, booting to OS must be manually specified.

Told my brother to use the computer with the side panel off (Like pictured) for the time being to see if the case restricting air flow is the problem. Also, In Australia and its winter so ambient temps aren't too bad (~17 Celsius currently)

If anyone has further insight, please do tell.

It shouldn't shut off due to overheating until it reaches ~90c, though. If there is a temp issue, it's more likely some other component overheating (vrms, say, though I don't see how that could be, given that they're only driving a 1500x).

What's your vSoC at? (sometimes listed as vddp in bios)? If at/below .945v, try upping to 1.05 or 1.1v (safe for continuous use). (Note that if you can't set it via the BIOS, Ryzen Master will allow you to do so via its interface, and a reboot.)

So the issue here is that you never did the stability or "burn-in" testing after building the PC. This is part of every PC build, and you should really do it before actually attempting to use the system.

Do it now. Download/Install ALL of the following:

Temps:

And also hwinfo.

I also recommend installing a game that has a loopable benchmark, like MLL. MLL is particularly good, as is Total War Warhammer, although that one does not have a loopable one.

Set SoC voltage to 1.1v, as foppe pointed out, then install all Windows drivers for your mobo from the manufacturer's website, set power profile to high performance.

If you can't be bothered to set the SoC voltage to 1.1, don't even bother with the rest of these steps.

  1. Try to test max CPU temps first using Prime95 for <1 hr. Monitor using RyzenMaster/Coretemp
  2. Then test max GPU temps using Furmark <1 hr. Monitor using Afterburner.
  3. Then test load wattage using Prime95 + Furmark <1 hr.
  4. Then memory stability using memtestx86 for at least 12 hours, booted from optical media or flash drive. Leave it on overnight.
  5. Then test peak CPU/GPU clocks using games on permanent loop >1 hr, <4 hr.
  6. Then test long term VRM stability. Start Prime95 + Furmark. Make sure the case is closed and in a corner with low ventalation. Air Conditioner (if in the room) off and leave it overnight.

If it survives that, then it the system is either ready to use or ready to begin overclocking testing.