[SOLVED] Need HW debugging help

[UPDATE ]:
A crappy electrical installation in my apartment in conjunction with a defective surge protector caused the freezes.

[ORIGINAL]:
I built a new system that unfortunately freezes randomly while it is on low load however under load it is rock stable. You can find my exact hardware specs at the end.

I am pretty sure it is hardware related because the same freezes happen with Win10 and Linux. Right now, I am trying to reproduce the freezes with booting into BIOS only.

The freezes seem to happen randomly. I had 2 freezes within 30min while working in a Fedora server shell (no GUI) but I also have seen 3 days uptime in Win10 desktop (not touching the system at all during this time).
Under load I never had a freeze so far. I was running AIDA64 stress test (CPU + GPU) for 1 week without issue and I also ran Prime95 stress test (CPU only) for a week without freeze. I yet have to test GPU load only (Heaven Benchmark).

So far, I have not been able to complete any multi-day RAM stress-test, probably because during this test the system is under low load which triggers the freeze. However, I have tested each of my RAM sticks individually and had the freezes with every single one of them. Hence, unless somehow both of my RAM sticks are corrupt it is unlikely to be a RAM issue (imho).

  • I have disconnected all my HDDs, SSDs and NVME drives and booted from an USB stick. Still got freezes.
  • I have tried the 3 newest BIOS versions and got the freezes with all of them.
  • Using a surge protector or plugging the system directly to the power outlet doesn’t make a difference.
  • Unplugging/plugging all my cables did not solve the problem either.
  • I disabled all the power-saving options I could find in BIOS. No change, still got freezes.
  • My system is neither overclocked nor undervolted, all voltages and clocks are on default settings.
  • It is not a cooling issue either. After hours of full load none of my temperatures are above 60°C (full custom hardline water loop, 5 chassis fans)

The last thing I can try from my point of view is replacing the custom power cables with the ones that came with my power supply. After that I do not know what to do, since I do not have any suitable replacement parts for my power supply, mainboard, CPU, GPU.

Any ideas? Any help is highly appreciated!

Specs:

  • Ryzen 7 1800x
  • Asus Crosshair 6 Hero
  • G.Skill Trident Z 3200MHz 32GB DDR4 (2x 16GB sticks, Samsung B-die)
  • Saphire RX Vega 64
  • 1x Samsung 960 PRO 1TB NVMe SSD
  • 2x Samsung EVO 850 500GB SATA SSD
  • 1x Kingston 240GB SSD
  • 4x 8TB WDred
  • Seasonic PRIME Titanium 750W

Have you tried to manually set voltage and turn off amd cool and quiet.

Thank you very much for your reply.

I can not remember having an option named cool & quiet in my BIOS, so I assume it must be one of the other power-saving options and in this case, yes I disabled it already with no effect.
The voltages I did not touch so far. To be honest I am not too comfortable with setting voltages manually, especially if I do not know what values are considered save. I am not a big overclocker (if I do I mostly use the auto-overclock features from the mainboard), the main reason I do watercool is for the benefit of having a quiet system and lastly this system is intended to run 7/24 for at least 5 years (my old rig is running 7/24 for 8.5 years now) so I do not want to shorten the lifetime of my components with high voltages. Anyway, what voltages do you recommend to increase, by how much? It is surely worth a try.

Check your kernel log, may have some good info. Fedora uses journalctl, I’m not familiar with it but a quick search brings journalctl -k

Thanks for your suggestion.

I already checked logs in Linux and Win10. Unfortunately there are no logs related to the freezes. I had the system freeze in the middle of a syslog entry (not related to the freeze). Win10 sometimes logs a Kernel Power error but can never log the bug number (it’s always 0) which means the freezes happen so fast that the OSes cannot write logs. This is why I am quite convinced of a hardware issue.
The Win10 Kernel Power logs are the reason I disabled all the power-saving options and re-checked my cables.

Well max is 1.35v for safety according to AMD so you can try that. Also try and set ram speed and voltage manually.

Yesterday I had a freeze in BIOS, so it is definitively a hardware and not a software issue.

I guess maybe RMA the motherboard.

Remove the CMOS battery for 30 seconds and put it back in,make sure the PSU is powered off/unplugged. If issues persist RMA motherboard.

I had very similar, almost identical situation as you described, with GA-P43-ES3G, back in C2D days, totally random freezes like that… Changed the board after I excluded everything else, and indeed it was faulty mobo…

“random” errors screams RAM, id try a ubuntu stick and run memtest86.
If anything just to rest assured.
Just let it run for 24H.
If it passes the ~24H mark with no errors you should be golden.
Do you have any over clocks going on?
CPU, GPU, RAM, etc.?
Else the only thing i can come up with is power, and a 750W PSU
should be more then enough for your system, and since the errors
doesn’t happen at high load, i doubt that’s your root cause.
And your system proberly isn’t even drawing 450-500W at max load, and even with
PSU drooping, your PSU would handle it like a champ.
If it turns out to not be RAM, it me be something as redicules as power saving functionalities.
Simply try telling your OS not to use power saving mode.

Have you tried enabling full boost + power at all times in the bios?
If so, does it continue to freeze under low load?
The only thing I can think of aside from RAM (which you said you’ve tested all sticks individually) Either the motherboard has issues with the RAM slots, or the motherboard is having stability issues when in lower power states, which is why I’m wondering if you’ve enabled full boost + voltage and tested.

Thank you all so much for your help and suggestions!

I think I’m a lot closer to figure out the problem. At least I can reproduce the freezes now and it is either my power-supply or a low-quality electrical installation in my new apartment in combination with a defective surge-protector. Maybe the surge-protector died because of too many surges…

I can reproduce the freezes by power-cycling a hairdryer or water-boiler connected to the same power-socket as my system. After a couple of power-cycles my system freezes.
I should get a new surge-protector tomorrow or the day after. Hopefully the freezes will then stop.

My power supply might still have a problem though. If power surges are the root cause for my freezes I would expect the power supply to shut down completely and not only partially.

May want to get a UPS if you can. Then at least if apartment has some dips in voltage the UPS will keep everything at a stable voltage.

Yes, this is definitively something I am looking into right now. I think I might get myself an Eaton Ellipse Pro 1200.
But first I want to verify my current suspicion. I also want to know that my power-supply is still OK and is not damaged in any way. I do not want to throw money at an UPS if it’s not necessary or solving my stability issues.

So far I did not have another freeze after installing the new surge protector. I also can not trigger the freeze anymore by power-cycling the water-boiler. I have updated the title and initial post. I consider this problem SOLVED.

Thank you all very much for your support!!!