Ryzen 3000, Booting Linux, and You

Fedora 30 (and Pop!_OS 19.0 and probably any linux based on 5.x kernels) won’t boot on Ryzen 3000?! What the heck?

It’s true. AMD is aware of the issue and it is being worked on. In the slides from the E3 event, changes to virtualization and new instructions were mentioned. Not a lot is known about this yet, but I have been working on it as I have time.

In a pinch, you can boot older kernels like 4.14 or use a distro like Ubuntu 18.04.2 LTS or Debian 9.9; those seem to work fine.

What’s the error?

When booting Fedora 30, you get a lot of error messages from SystemD. Same with Pop 19.04. Pop is especially useful for easily manipulating kernel parameters like maxcpus and noacpi and disabling rdrand but I haven’t found what actually works. When I change the init line, I get a kernel panic and something about offline CPUs. Last time I saw that error, it was just a bad ACPI table for Linux but… things are different now.

Here’s the messed up thing

So if you install Windows on bare metal, which is fine, then install VirtualBox, then try to boot from Fedora 30 it dies exactly the same way In a VM.

What? How on earth is that possible?

Virtualbox is using the SVM/hardware extensions. I am not sure why this would be. Does it make sense to you?

Disabling NPT support in Virtualbox will allow the system to boot normally.

Any known work arounds?

I installed Debian 9.9 so I could experiment. It worked fine. I updated my kernel to 5.1.14 and that worked fine. When I rolled back to 5.0.9 it still worked fine BUT only for the first boot (!?!?!?!?!).

It seems like to me that the system was left in a state by the newer kernel (or the older kernel!?) that would allow it to boot. I am not sure what this means, but it doesn’t seem good.

AMD is aware of the issue and investigating. I am really hoping this is a quick kernel tweak or firmware update. I will be surprised if it is a kernel bug, unless it is one that AMD introduced themselves a while back a la the encrypted-virtual-memory-can’t-boot firmware update bug.

Some boards have options for enabling/disabling SEV in UEFI. I played with those options, though not exhaustively, and didn’t make any progress.

It should be possible to update your system in another computer to get past the 5.0.9 problem on Fedora, but I’m not 100% sure about the steps to reproduce here.

In two update scenarios on the 3900x, systemd took 4 of my 12 CPUs offline for mysterious reasons. Echoing 1 into online under /sys brought it right back.

Stay tuned, and watch this space for updates!

Big thanks to our Patrons for helping fund this work! patreon.com/level1

update 7/8

Sooky journal entry – if you boot into windows then reboot into Fedora 30 it then works fine. What cpu instructions are non deterministic?

If you press the hard reset button it still works
Turning the machine off for < 30 seconds it still works

More than 30 seconds reproduces the fedora issue.

random.trust_cpu=0 doesn’t seem to help the boot situation. Nor does nordrand

However I’m still suspicious of rdand because it might not work if there isn’t enough entropy yet… I guess …

Workaround that works:

https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1835809

Patch to systemd and everything boots again. This also explains why in some scenarios it was possible to boot a bugged distro. Rdrand had already been initialized.

Systemd is userland so kernelparams like nordrand are ignored. I feel like systemd should pay attention to nordrand… Hmm…

10 Likes

Hi!
Have you tried on of the Fedora Respins?
The ISO, at this date, is dated 2019-06-28 with Kernel version 5.1.6. And almost as a rolling release, they rolled out 5.1.16 (“Craaaazy!!!”)
https://dl.fedoraproject.org/pub/alt/live-respins/

On Fedora 31 they will pull the kernel direct from mr. Torvalds machine. :smile:

1 Like

Ill try but it is weird as it behaves intermittently

1 Like

I’m guessing this is on X570? What about older chipsets? Maybe it’s not the CPU?

Same on tomahawk b450

1 Like

I transferred my old drive running Fedora 30 with kernel version 5.1.16 to my new 3700X + Tomahawk B450 machine and it won’t boot with the same symptoms. I’m going to try to downgrade the kernel to 4.14 via chroot, I’ll report back with my results.

2 Likes

Guessing mode here:
Maybe your sample has some problems (doubt that) as Steven from Hardware Unboxed probably had. His one died.
Or are the PCIE 4 instructions causing problems (?).
Have you tested on Fedora Silverblue instead of Workstation or one of the spins?
Sorry things comes to mind don’t want you to waste your time.

Since proxmox is still on 4.18 ( last I checked). Is it ok to use with ryzen 3000??

Should be fine, that is what Ubuntu 18.04 boots with.

1 Like

I tried downgrading my arch install to various kernels with no luck. What ended up working is downgrading systemd to version 237. I used the systemd-git aur package and modified the pkgbuild to point to the older git tag. Now arch is booting.

Hopefully this helps as a stop-gap until it’s fixed.

4 Likes

Interesting. I will lol so hard if systemd div by zero because the CPU is too fast is the root cause

3 Likes

Very interested in following this thread. Got a Crosshair Vi Hero on launch for the very reason of upgrading later down the line.

Doesn’t say anything in the ASUS website about support for 3000 series…yet

I’ll be getting a 3900X tomorrow to run on an older x370 motherboard, will report any findings (I run arch so I assume I’ll see the same issues)

Will try the systemd downgrade as outlined above

1 Like

Did you try systemd git package without forcing the versions ? As it could possibly be fixed in git already ?

Im still wondering why these x570 Boards are all PWM monsters that can deliver 300+W and all the tests the new chips used less power :slight_smile:
Look forward to how it turns out. I saw a few board makers says they will not support the new chips on old MB’s

I have a asus x370 board which seems to have to power delivery needed.

So something like Alpine or Void Linux would work, right?

What the heck?!? Systemd IGNORES kernel parameters? It’s like the kernel ignoring C-state parameters from the BIOS…

There should definitely be a configuration toggle within systemd so that this doesn’t happen where it’s forced on, despite what kernel arguments are used. It makes so much more sense to just have a damn toggle.

Can you try playing with the high_quality_required Boolean, @wendell? Apparently this was a change made in 2018 to systemd as reported by Phoronix:

https://www.phoronix.com/scan.php?page=news_item&px=Systemd-RdRand-Direct

And the forum thread is extremely confused as to what value the Boolean should be…

I think it’s just systemd.high_quality_required=false as a kernel parameter… That’s my speculation anyways.

That’s crazy though that the rdrand on Ryzen 3000 has negative integers, which is a BIG no no.