Troubleshooting 256GB RAM on TRX40 platform

I was dumb enough to pull the trigger on 256GB RAM without checking motherboard QVL.
I was also brave enough to run unstable version of Debian based distro (Devuan) on the platform which I upgraded.

The outcome of this is that I can barely boot up the system now. There are lots of rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: during boot.
Sometimes the boot hangs, sometimes it loops and on very rare occasions I can get to shell and even do startx, but the system and apps very quickly becomes unresponsive.

I don’t know how much of the problem is caused by RAM and how much its just typical (sid) problem.
Don’t know if there is any permanent damage in the system so I want to ask here first.
I can swap RAM back, but it is a fucking chore with cleaning and repasting a Threadripper.





So I reinstalled old RAM, and the above boot messages are still present.

I still can’t find if this is hardware or software failure. Or both.

Will try to boot with Live ISO next.

Worst case scenario its bricked motherboard. Best case - corrupted Linux installation.

mayebe this helps

https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt

1 Like

Thanks! From reading this it is looks unlikely that this is hardware issue.

My plan right now is to get either to the console or Live ISO and try to update the system.
If this succeed I’ll give new RAM another try. If not I will reinstall which I REALLY don’t want to do.

Now that I think about it, I can’t remember if I cleared CMOS after swapping RAM the first time. Could this be the source of the problem?

is the system running bios with other mem. Looks like a benchmark was running and the system shut down inexpertly

Don’t know if I understand you correctly, but I have access to BIOS with both, old and new RAM. Can change their speed for example without a problem.

The problems start after I open encrypted lvm with linux install.
Up to this point there is no problem with any of RAM kits. No instabilities, crashes, etc.

on most installs or livecd’s there is a mem test. mayebe try that one to make sure it works
did you add the correct encryption key ?

I can open lvm, and I didn’t add keys. I just set encrypted lvm up during installation (it was 2 years ago).

Will try memtest.

did mem test run?

Still checking old RAM (128GB). I needed to redownload Live ISO so it took a while.
After this will need to swap RAM to 256GB and check too.

So far with old RAM everything on Live ISO seem stable.

Edit:
First pass of Memtest86 on old RAM is done with 0 errors. I will run it overnight to be absolutely sure it’s fine, then I will test new RAM.

the downside of many ram

1 Like

After 3 full passes on old RAM memetest shows no error.

I changed RAM to 256GB and enabled XMP to advertised 3600 MT/s.
First quick memtests also didn’t show anything. No errors.
Currently running full test, and I want to do at least 3 full passes as with old RAM. It might take over 24h.

If this is hardware issue, its either some obscure RAM thing that might show up after couple full memtests or maybe motherboard?

I’m more convinced that this is software issue and the Linux install is broken.

Have you tried with an older kernel?
What parameters, if any, are you booting on in grub?

1 Like

Didn’t try booting with older kernel. That’s a good advice, will try.

Honestly, can’t remember what parameters I’m running. After memtest will try to check.


Edit:
First pass on 256GB RAM is without errors.
I’ll still run it twice more, but this looks less and less a hardware thing.


Edit 2:

FUUUUUUCK.

Maybe this is because I’m running it on 3600 profile.
I’ll lower the speed to 3200, then do memtest again.
If its ok, I’ll boot with older kernel and check GRUB params.

Then I’ll run memtest again with 3600 speed to confirm it.
I can go to 3200MT/s but not lower. If this speed also will return errors the RAM kit is going bye bye.

UPDATE

After 4 full passes and 45h 256GB RAM at 3200MT/s seem stable so far.

I found that it is advised to run memtest86 for at least 8 passes to be sure the memory is fine, and I want to do it.
It will take at least another 40-50 hours.

UPDATE 2

The memtest with 3200 speed didn’t show any errors, so I assume the memory with this speed is fine.

Before I do anything with GRUB I want to backup any relevant data, so this also might take a while.

Unfortunately it looks that from live distro I can’t access partition with the OS with opening and mounting through Thunar. I can however open and mount every other data storage.
Will check manual opening and mounting.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.