A possible problem with my system?

Hi, I’m a lurker on this forum and I decided to register here because I’m not sure if my system does have some hardware problems or not.
And I apologize in advance if my English is not clear/good enough, since it’s not my native language.
My system:
CPU: AMD Ryzen 5 1600 AF, not overclocked.
Motherboard: Gigabyte Aorus B450 Elite, it came with the latest BIOS available at this moment (F51).
Memory: 32 GB (16x2) DDR 4 Corsair Vengeance LPX Black 32GB DDR4 3000MHz CL16 Dual Channel Kit CMK32GX4M2D3000C16.
Graphics: XFX Radeon RX 470 8 GB 256-bit.
OS: Debian Linux 10.4 stable AMD64 with the open-source amdgpu driver for the GPU and Windows 10 on a separate HDD, installed for gaming purposes.

The problem: after building the system and installing the OS, everything seemed to work fine until today when I’ve experienced a sudden freezing of the system then it restarted automatically but instead of (re)booting it remained inactive, nothing was displayed on monitor and on the motherboard the CPU and RAM red LEDs were blinking back and forth (i.e the CPU led was blinking then the RAM led then again the CPU led and so on).

I forced a shutdown by pressing the power button then restarted the system and it booted fine to the OS.

It’s first time when this happens to me and frankly I’m a bit worried that my system could develop more problems so I wanted to ask more knowledgeable persons about this.

Should I be worried or am I over-reacting?

Hello! and welcome to lvl1techs :slight_smile:

It seems your computer tilted - It can happen to all of us :smiley:

I would check cpu and gpu temperatures under stress to make sure its not an overheating problem. (hw-monitor for windows) lm-sensors should be ok for cpu temperatures. For gpu temperatures nvidia-smi for nvidia cards - radeontop for amd cards.
Rembemer you can run these in watch to update every n seconds. “watch sensors -n 2” for example.
If temperatures seem good - but the crashing continues you need to check out that psu voltages are ok under load. (I’d guess even ±5% change might be too much)
Also check the memory. (physically and by memorytest)

Should I be worried or am I over-reacting?

Yes and no :smiley: If it works it works - if something is broken troubleshoot and rma. (I’ve only had a couple of problems with my hardware over ~5 years and I’d wager that only under a week of downtime)

1 Like

Doesn’t seem worth a panic.

The LED lights, from what I could dig up in the manual, aren’t super helpful. I’d test the PSU first and the RAM second if this persists, but the advice above by @Pollomoolokki is solid too.

Best of luck!

I would run prime95 stress test for 24 hours.
if this test passes, then it is most probably a software bug, nothing to worry about, just keep updating your system.
If it doesn’t pass then it’s a hardware problem that needs more investigation, it could be anything.

Thanks to all.

It happened again, after booting in Linux and when I wanted to start Firefox, the entire GUI (Xfce 4.12) just froze but with a difference: the cursor was still moveable but did not have any effect and the system did not restart automatically, I rebooted it manually.

What is interesting is that this issue is apparently occuring only in Linux, I could not reproduce it in Windows 10 but the caveat is that I rarely use Windows (it is just for gaming) so I’m not sure if that is relevant.

I have read that Linux does have some problems on Ryzen, it’s true?

I will do some CPU/memory tests and regarding the PSU, it is a Corsair HX620W and it’s an older model but still capable. I did not have any issue with it previously.
But it’s possible that it’s showing its age, unfortunately I don’t have another PSU to do a test.

I have read that Linux does have some problems on Ryzen, it’s true?

Yes It has been said - but from what you described this seems more like a gpu problem. Have you made sure to fully-upgrade your system? and tried another driver for the gpu. (On nvidia we have the binary blob + open source version (nouveau)) I’m not familiar with amd unfortunately.

Also Debian stable is said to have a bit older packages than other popular distros (ubuntu etc.) e.g. for xfce 4.12.5 in debian, 4.14.1 in ubuntu.
For updating and upgrading I’d recommend to alias this “sudo apt update && sudo apt -y full-upgrade” to something like “update” or “upgrade”

Linux sometimes feels like a bit of an journey… but thats the way I like it! :slight_smile:

Hmm, I think the advice above is good, but I would also want to try to replicate issues while running another distro (maybe Ubuntu 20?) live or on a spare HD. Debian stable is of course, generally rock solid, but you are running an older kernel with a fairly new processor - I mean, I realize it’s a rehash of 1st gen on a 2nd gen platform, but maybe there are other alterations or quirks.

Well, I did run a quick test on Debian by using the stress-ng utility and the system froze immediately upon running it. It does not seem to be a GPU problem but a system problem.

Again I was forced to reboot manually and this time I decided to boot into Windows, install Prime95 - I chose Blend test option - and ran it for 2 hours without any issues (0 errors, 0 warnings), CPU temp was under 70 Celsius.

So the issue definitely seems to be a Linux problem but what is weird is that Debian Linux ran without any issues on this machine when it had only 8 GB of RAM before, the 32 GB of RAM is an upgrade I added recently, basically yesterday.
In short: Linux ran fine with the old 8 GB of RAM memory but after I replaced it with the new 32 GB of RAM memory, the issues started and it seems that only Linux is affected.

How can a memory upgrade cause stability problems in Linux but not in Windows?

@astra: I do have a spare HDD, I might install Xubuntu 20.04 on it and see if this problem replicates in Ubuntu too. If not then it’s indeed a Debian problem.

Did you run mem tests? Could be one of the new sticks is problematic, and linux is more sensitive for whatever reason. Though I would still try the new kernel first, being as that’s much faster.

Ok if you changed memory and started having issues start there. My recommendation is to set 2666 or even default jdec speed for that kit and see if problem persists. You went from a single sided dual rank config to a double sided dual rank. This puts more stress on the memory controller. The board appears to get stuck in a memory training loop. Which means its on the verge of instability. Once you confirm stable at JDEC or 2666 we can try to tweak back to 3000.(might need a slight voltage bump to the memory or imc) This is where the ryzen memory calculator comes in real handy. -my two cents

This makes sense to me. But what would explain the stability in windows? Sure you could encounter problems long after 2 hrs of prime, but debian is unstable just loading firefox? Not trying to argue one way or the other - genuinely curious - I have a lot to learn about these things.

Edit: that was supposed to be @CrazyLegsFE

It is very common for machines to be “stable” enough for windows / gaming and then be completely unstable for production workloads like premier and blender ect. Or in your case the Linux distros.
Because there are so many variables it’s hard to speculate exactly the root but different sequences of memory paging and OS interaction can mask some memory issues. That’s why when truly testing a machine for a critical deployment lengthy testing is required. Unfortunately 2 hrs is very short for prime or memtest, I would recommend minimum of 12 hrs.

The last random blue screen issue I experienced was tied to a memory issue wasn’t detected for around 30 hrs using memtest86. Unfortunately there is no fast reliable way to pin these down in most cases. All that being said we still have some known variables and can probably get the kit to run full speed. Let me know if Linux becomes stable at lower speeds.

OK, I did a memtest, booted an Ubuntu live DVD and ran memtest86+ 5.01 (I know that I should get the newer Memtest86 but I don’t have a spare usb stick for it) overnight.
It was slow, took over 12 hours for 3 passes but there were no errors at all. I know that it’s inconclusive since more passes are needed to assess RAM stability but I really cannot afford to wait for more than 12 hours.

Regarding RAM speed, in BIOS it’s shown as being at 2133 MHz, XMP is disabled so I guess it’s already underclocked. I found a setting called “Power Down Enable” for DRAM controller, it is set to “Auto”, should I set it to “Disabled”?

After all of this, I booted again in Debian and this time seems to run fine, no problems yet. I don’t know what to do further, any advice?

Interesting, do you still have the 8gb kit? If so let’s see if it still works without error. If it does let’s then test each new stick one at a time using stress-ng. If it does test with one stick at a time lets set the memory voltage to 1.35 and try both sticks.

I must apologize for not following your advice literally and that’s because I no longer have the patience for dealing with this issue and I’m tired.
What I did: I left both of new sticks in place, booted Debian in text mode and ran stress-ng memory tests for 2 hrs.
The result was good, no errors. And in BIOS I changed DRAM Power Down Enable to Disabled, default was on Auto. Also I didn’t not change RAM voltages because I’m hesitant to fiddle with memory voltages.
For now, the system seems stable enough and if I encounter problems again then I will definitely post in this topic and apply a more aggresive strategy.

@CrazyLegsFE - I appreciate the explanation.

@OP - So, you didn’t even re seat the RAM (unless I’m missing that somewhere), nor did you update debian, but the symptoms have disappeared(?); curious. Seems possible RAM could become unstable during a power down/idle, but not sure that matches what happened (or really, when the instability occurred).

I agree with above advice to set a static voltage and step up if problems persist. I was nervous about that, and stepping up CPU voltage at first/too, but it’s not a big deal if you stay at/below safe levels. I have an original r5 1600 running all clock OC @ 3.975 and RAM rated at 3000 running at 3200 - took a good few days of adjusting frequency and voltage up or down and then testing to get something rock solid with a sort of compromise between the fastest stable speeds and the lowest voltage and heat output, weighted towards performance obviously. It sounds like you might not have the time, desire, or that would enjoy that process as much as I did, so all to say: it’s not really that scary once you get into it.

Anywho, glad things are working for you currently, but if that changes please update - I’ll be interested to know if the culprit can be determined. Good luck!

@astra: I have not reseated the RAM sticks because I’m pretty sure that I have seated them correctly - I’m quite careful when manipulating and installing PC components.

I do update Debian regularly but being on the stable branch I don’t get updates too often and they’re mostly security updates. At every boot I do issue apt update and if there are updates I apply apt upgrade.

Regarding the issues, they appeared basically after I have upgraded the RAM.

I believe (but obviously I’m not certain) that the solution that (at least apparently) solved these issues was to set DRAM Power Down Enable to Disabled in BIOS because I have read that it causes stability and compatibility issues if it’s enabled.

About RAM voltages, I’m worried that if I change them I might face again the issues I was dealing with until now. Perhaps in the future I might play with them but for now I don’t want to take the risk. That’s all.

Yes, if things change and problems re-appear I will definitely update this topic.

And finally I really appreciate the good advice I received, I’m serious when I say that.

Oh, I just reread my comment and realized that it could have come off as accusatory or something - didn’t mean it that way at all - just surprise that it seems to have resolved with very few steps - wanted to confirm it wasn’t either of those things, for curiosity’s sake.

As far a seating RAM, I think it’s possible to encounter an issue even when being ‘quite careful’, but that’s just my experience/opinion.

Also, didn’t intend to suggest it definitely wasn’t the power down setting - again just surprise that it would be, bc it sounded like the instability occurred when your sys should be ramping up, rather than down. But whatever the case, I’m glad it’s working for you now.

The ryzen memory calculator I mentioned earlier would cover more of the “advanced” type settings including gear down. My intent was to get it stable as close to stock as we can with as few changes as possible. To be clear we are not overvolting that kit is rated for 1.35 volts 24/7 and if you enable xmp/docp it will use 1.35 anyway. That’s why I suggested it to see if it made any difference. I’m excited to know you have it stable for now and perhaps this tip could help someone else in the future. Cheers’ please post back if you have more findings as these things can really help someone else if you find a “quirk” with a follow up “this worked for me” .

An update: the system froze again and this seems to happen quite random. I mean that the system often works normally but sometimes this problem just appears.
What is weird is that this seems to happen more often when using Firefox to login in Gmail or to play Youtube clips.
Perhaps you guys are right and Debian stable is a bit too old for my system, I will try Debian Testing.