AM5 Linux triggering suspected firmware bug with S3 sleep

Hello,

I’ve recently upgraded my computer. And I’m having some really weird issues.
Specs:

  • AMD 9800X3D
  • MSI x870e Tomahawk
  • 32 GB G.SKILL memory
  • Nvidia 2070 Super (yet to be upgraded)

To preface this, these specs are not the ones I’ve started with. I’ll try to make it easier to understand, but this will be weird. Motherboard, ram and cpu have all been replaced at least once (or 5 times) hopefully eliminating a hardware issue.

At first i got a x870 Tomahawk with some Kingston memory (QVL validated), however I’ve had some weird issues. Once in a while when restarting, the motherboard took a lot longer to POST and then said something along the way “memory has changed, enter setup”.
Upon entering setup, I’ve noticed one of my sticks reads as “Unknown 2GB” instead of Kingston 16GB. Even after booting into Windows, everything reports lower memory capacity. Reading the memory SPD has the values scrambled, no manufacturer nor EXPO timings, 2GB capacity and a wrong serial number.
The state persists reboots, but full system shutdown clears the issue. Seemingly no permanent damage happens.

Over time, I’ve figured out a scenario. The steps are:
1 - Sleep computer and wake it up
2 - Sleep it again, and wake it up (more sleeps seem to increase the chance of it happening, though 2 make it almost certain)
3 - Reboot the computer

Trying to troubleshoot it, I’ve exchanged every single component for the same exact one. It still happened. I decided to try other motherboard manufacturers, Asrock does it too, Gigabyte doesn’t (but i didn’t like their pcie lane allocation).
After weeks, MSI support came back to me and claimed to have reproduced it, blaming it on Kingston.

This lead me to trying with Corsaid and G.SKILL memory. Both of these have a completely different issue though. The “Unknown” state doesn’t happen, but when rebooting, the computer gets locked up on POST code 44 and fails to initialize the memory. As before though, fully powering the computer off fixes the bad state with seemingly no lasting effects.

As far as I can tell, there is something with Linux triggering some obscure firmware bug. I have failed to reproduce this when using Windows. As far as I can tell, Windows is also using S3 sleep and not S0.

To ensure the latest kernel version, I’m testing mainly on my Arch linux install, however I have tested it with a completely fresh install of Fedora 41 and Ubuntu 25.04 and it still happens. This should eliminate any wrong configs on my side. It also doesn’t matter if EXPO is enabled or not. BIOS version is also kept up to date, but different versions make no difference in the behavior.

I have opened tickets with the relevant parties (motherboard manufacturers and AMD), but i’m getting the classic support runaround, so I decided to post this here in case some linux wizard sees this and has any idea of what might be causing this. My main wonder is what could the difference be between Windows and Linux, since ACPI and entering/leaving sleep states should be fairly uniform.

Thank you for any replies or ideas.

Welcome to Level1Techs!

:+1:

:+1:

:+1:

All in all reasonable hw troubleshooting steps. Good job.
All I can think of that you could try, but haven’t mentioned is

  • reseating CPU (after all the CPU pins connect to RAM)
  • reseating RAM (although this should be taken care of by using different sticks)

This leads me to believe that this is some weird driver issue. Windows drivers often do some weird wake-up things that Linux drivers don’t.

Following this train of thought I would try removing any non-essential piece of hw from the config:

  • USB connected items (e.g. use most basic mouse/keyboard)
  • PCIe connected items (e.g. remove GPU, iGPU should be fine for troubleshooting)
  • m.2/SATA (remove any unnessary storage device. Even try using different root drive)

Why this when the error message indicates RAM error? Well, any driver needs to access RAM when waking up from S3. Maybe that’s triggering the error…

I had several hw configs where I could not get S3 to work reliably on Linux, my current one does. I have no idea what the difference is.

Good luck!

Hello! Thank you for your reply.
I haven’t talked too much in detail to not make the original post too long, but I can quickly cover most of your points.

I’m on my second CPU, fifth or sixth motherboard (different chipsets and manufacturers) and perhaps fourth or fifth kit of ram (also different manufacturers, always QVL to the motherboard i had at the moment)

This mostly covers all reseats, but since I have open ticket with MSI, they have also requested independent reseats and cleanings of the CPU and RAM contact pads (done with 99% IPA)

Some of the tests have been done with all of my personal drives removed, without ethernet, using the simplest keyboard/mouse i got - cheap RF wireless single dongle for both.

One thing I haven’t tried is using the iGPU, as you mentioned, it should suffice for troubleshooting and I will be trying that, however my hopes aren’t too high.

The odd thing in this issue is, S3 works great. Sleep/wake is fast, everything restores correctly, as far as I can tell, this error state does not affect the runtime of the system at all. However something in the BIOS/AGESA has gotten itself into some odd state, that prevents the memory being intialized a second time after a reboot. (code 44 is not listed in the manual, however google search nets that it’s memory initialization related code)

Powering the computer off and then back on resets this state and memory can be initialized again.

In terms of the Unknown issue, the thing that was getting messed up was the SPD. It was reporting nonsense data, despite the memory being write protected according to Kingston. So either the memory wasn’t fully write protected, or whatever in BIOS reads it, reads it incorrectly. So if something in the new memory is reading the SPD even more wrong, it could be causing the memory to completely fail to initialize. However I have no clue what. And if the SPD really wasn’t write protected, I feel like the corruption would be permanent, instead of lasting only until a power off.

Thanks for sharing all the things you’ve done.

Oof. That’s a lot of change. Sorry to hear about your ordeal.

There is some silver lining in this, though: with all the changes and the issue still being reproducable it’s time to ask “what are the last things constant during all of this?”

  • PSU?
  • Socket (AM5)?
  • ???

I doubt your issue is specific to AM5 socket - I believe we’d have heard a lot about it since inception.

PSU is my current favorite.

Any other ideas?

Yeah… Been dealing with this for around month and a half now.

The PSU is definitely a constant, it’s the Seasonic Focus GX-850 ATX 3. It should be a solid PSU and usually I wouldn’t even consider it. Other than that there’s pretty much nothing else besides the GPU. But I don’t have any other PSU to test with.

MSI has claimed to have reproduced the Unknown issue before blaming it on Kingston, but experience with their support has been… an experience. Where i’m seemingly getting pinged between different agents who fail to read the ticket more than 3 replies down. If they really reproduced it, it would prove the Unknown bug isn’t a “me” problem, the question is, if the Code 44 bug is an extension of the same bug.

That’s one of the things I find the most confusing about it. There seem to be a lot of memory related issue in general on the AM5 platform with current CPUs and chipsets, but despite searching, I have failed to find anyone with even a similar problem. I did find one person with “Unknown” showing for the RAM manufacturer, but no one with it also reporting 2GB.

That was in part also a reason for this post. To find anyone with similar specs (9800X3D, x870e motherboard ideally from MSI/Asrock and Kingston/Corsair/G.SKILL memory) and see if they might try it and reproduce it or not.

If i had any, I wouldn’t be posting here. :frowning:

I don’t remember the thread, but we documented on this forum that a recent platform (latest Threadripper?) increased demands on capacity of the 5V rail of PSUs, which apparently many, even reputable, PSUs didn’t fullfill.

Waking up from sleep puts a strain on PSU - for all the changes you went through I’d consider this the last one.

This is not a dialog between two people. Hoping to get other readers to bring in new ideas. I understand your position.

1 Like

Probably not relevant.

Are you sure about that? Classic legacy S3 is often no longer supported at hardware level. Its way too common at laptops and might have been axed even here.

Only way to check I know of is powercfg /availablesleepstates in windows.
Sometimes there is switch in bios, but lot of oems do not bother giving the option anymore.

Second thing to check at least on windows is effect of fast startup , it might complicate troubleshooting by preserving kernel state across reboots.

Sometimes it triggers weird bugs.

EDIT:
S3 supported on my gigabyte b650 + 7950x3d system.

This is certainly one of the things I’m least knowledgable about. As you say, there is modern standby that is replacing classic S3 in laptops, but i don’t know about it happening on desktop platforms.

My assumption regarding sleep state was exactly because of the command you mention.

$ powercfg /A
The following sleep states are available on this system:
    Standby (S3)
    Hibernate

The following sleep states are not available on this system:
    Standby (S1)
        The system firmware does not support this standby state.

    Standby (S2)
        The system firmware does not support this standby state.

    Standby (S0 Low Power Idle)
        The system firmware does not support this standby state.

    Hybrid Sleep
        The hypervisor does not support this standby state.

    Fast Startup
        This action is disabled in the current system policy.

Another relevant part from within linux:

$ cat /sys/power/mem_sleep
s2idle [deep]

Sleep mode deep is selected as the active one, which should be mapped to the classic ACPI S3.

But as mentioned, this is a subject i’m very fresh looking into. If I’m making any wrong assumptions, i would be glad to have them corrected.

I have fast startup disabled, mainly because it causes issues, prevents write-mounting NTFS from linux and messes up Wake on LAN. Also as far as i know, fast startup only concerns powering off the computer and reboots actually fully reboot the kernel.