Hardlock with Linux, fine with Windows (Gigabyte AX370)

As stated in topic, I have bad time trying to switch to Linux (Manjaro).
My rig is overclocked and rock solid under Windows10.
Under Linux it locks itself up: sometimes on logon, sometimes after 12 hours of work.
I’ve tried kernel parameters: idle=nomwait and processor.max_cstate=1.
Turned off DRAM sleeping mode and played with PSU settings in BIOS (erp and something else there in same menu).
Hardware: AX370 + Ryzen 1600 + RAM 2x8 @3400 + RX580 + SSD(Windows) + nvme(Linux)
The hardlock is quite hard: I have to turn PSU off for about 10-15 seconds, just to hear some buzzing in speakers and then it’s safe to turn it on and boot.
I would assume hardware defect if not the fact, that I have 0 problems with Windows.
Any thoughts?

That doesn’t mean much. Linux does many things differently. If you remember the 1700X Ryzen, some versions of it crashed hard when running GCC compiles under Linux, because Linux context switches faster, uses the TLB cache harder and a few other things. Meaning that under Linux the CPU voltage fell too low and it locked up.

That didn’t make Windows a better OS. It was just a slower OS.

My current Ryzen 3900X with memory set to 3600 MHz appeared reliable and solid to memtest and other tools, until I did Android Studio NDK builds on all 24 threads with clang++. It was definitely memory corruption when it had errors with symbol names that were obviously bit corrupted. That was while using Ubuntu 18.04.

Strange random IOMMU errors with my Vega 56 also went away when I stopped overclocking the RAM.

Anyway, summary, stable Windows does not mean it is actually stable.

1 Like

correct!
the noise from the speakers however sounds like bleed off of the capacitors in the psu.
its usually noticeable with pulse audio used by linux.
it does not mean hardware damage though,
I dont bother overclocking as i have no need to.

the problem with overclocking and lowering the voltage is its possible to drop below the efficiency threshold on voltage and the cpu will run hotter as it is under powered when under a load.
therefore to prevent overheating you need to increase cooling fan speed or seriously upgrade the cooling system.
if you keep fan speeds low due to its noise you are defeating the purpose of the cooling fans.

consider a three phase motor! using an frequency drive
you can control the speed at which it runs But the trade-off is when the motor is not running at its rated speed its cooling fan loses efficiency and does not cool the motor off properly.
excessive heat breaks down the lubricants and the bearings fail early. (If the motor does not burn out a winding first!)

but overclocking is your choice so its your equipment and money that you risk!

I’ve been wrestling with a similar Windows-stable/Linux-unstable issue for months now, though I don’t need to bleed off the caps (HW reset button works fine for me). After cranking system agent, VCCIO, and Vcore to 1.3 V across the board without results, I had a breakthrough tonight. Setting everything to be fixed at stock settings— 3.6 GHz, no turbo, 1.2 V Vcore, etc—I found that I couldn’t boot with XMP enabled. I’ve only tested for a little while, but I have the feeling I finally cracked my issue.

I don’t think we have the same problem, but the point is that Linux stretches different muscles than Windows. Unless you get the same crashes/hardlocks at stock settings, it’s probably an instability stemming from your overclock. The tricky thing is that it might not be an obvious part of your overclock.

I would save your overclock settings, then clear CMOS and start over, making sure to test in Linux at each step. It’s a pain in the ass but kind of the only way.

Thank you for replays.
I’ll check now, how it behaves on stock, but if I recall it right, it failed at stock too.
Here’s some more Info:

  • cooling: 3 intakes, 2 outlets, simple Noctua tower with 120mm on CPU
  • newest BIOS: F50a but had same problems with F31
    Considering BIOS, there are some warning on download page:
    “Update AMD Chipset Driver 18.50.16.01 or later version before update this BIOS.”
    “Before update BIOS to F40, you have to install EC FW Update Tool (B19.0517.1 or later version) to avoid 4DIMM DDR compatibility on 3rd Gen AMD Ryzen™ CPU.”

Wonder if it could be the reason (I did it, from Windows ofc)
Do any of you use the same mainboard with Linux?
If so, which distro / kernel?

Maybe a bit obvious, but did you already tried using a different distribution and or DE?
Just to determine if this particular issue might or might not be distro or DE specific.
I mean you mention that you don´t encounter any issues in windows.
So yeah maybe it’s just a distro specific issue with Manjaro in this case.

Testing out a different distro would be the first thing i would try.
Because there are many of them.
It could be an underlaying problem with Manjaro or even DE,
or a power management setting or what not.
However then it would be likely that more people would encounter similar issues.
Still it might be worth trying imo.
I mean it wouldn’t hurt trying a different distro for testing purpose.

the other difference with your Linux install is that it’s on nvme, so cloning Windows onto that and if you get a crash then cool.

recently I discoverd a full system crash ralated to me using Falkon browser (really, lol) and since Manjaro is cutting edge try a more conservative choice like Ubuntu LTS for a while.

Also interesting board “4 Front USB 3.0 Ports with Adjustable Voltage” how does that work?

@MisteryAngel - I’ve tried both versions of OpenSuSE with no success. But it was while overclocked. Now with stock settings and XMP timings (3200MHz) the system is stable since 12 hours with 66% constant load - nice, thank you.

That power settings you talk about, where to find it in Linux?
Readings from command line ‘sensors’ were not very helpful, even after configuration. I’ve tried to run HWinfo64 with Wine - it works! Somehow, hmm , it shows constant 1.55V on cpu without fluctuation, so I think the readings are as reliable, as with ‘sensors’.

@Buckshee - I’ll try to clone if stock settings will not solve the problem, thanks for hint.
As for USB overvolting, one can add up to 0.3V to any of three USB groups. Doable from BIOS or Windows with Gigabytye’s apps. I used it once, it worked but I can’t say if it was faster then normal voltage, (was in hurry).

12:22:53 up  4:32,  1 user,  load average: 8,22, 8,14, 7,46

So far, so good. I can live with 10% less computation power, but this experience makes the X series of Ryzen more compelling to me - too bad, I liked $40 saving with non-X versions.

that’s a cool feature, I’ve have made some chargers that run at 5.25V, maybe its more usefull for things like SBCs, but I do notice battery packs charge quicker too

Going back to stock seems to solve the problem, thank you guys!

2 things learned:

  • OC hurts penguins
  • how not to read output of ‘uptime’ :wink:

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.