Fedora 30 (and Pop!_OS 19.0 and probably any linux based on 5.x kernels) won’t boot on Ryzen 3000?! What the heck?
It’s true. AMD is aware of the issue and it is being worked on. In the slides from the E3 event, changes to virtualization and new instructions were mentioned. Not a lot is known about this yet, but I have been working on it as I have time.
In a pinch, you can boot older kernels like 4.14 or use a distro like Ubuntu 18.04.2 LTS or Debian 9.9; those seem to work fine.
What’s the error?
When booting Fedora 30, you get a lot of error messages from SystemD. Same with Pop 19.04. Pop is especially useful for easily manipulating kernel parameters like maxcpus
and noacpi
and disabling rdrand
but I haven’t found what actually works. When I change the init line, I get a kernel panic and something about offline CPUs. Last time I saw that error, it was just a bad ACPI table for Linux but… things are different now.
Here’s the messed up thing
So if you install Windows on bare metal, which is fine, then install VirtualBox, then try to boot from Fedora 30 it dies exactly the same way In a VM.
What? How on earth is that possible?
Virtualbox is using the SVM/hardware extensions. I am not sure why this would be. Does it make sense to you?
Disabling NPT support in Virtualbox will allow the system to boot normally.
Any known work arounds?
I installed Debian 9.9 so I could experiment. It worked fine. I updated my kernel to 5.1.14 and that worked fine. When I rolled back to 5.0.9 it still worked fine BUT only for the first boot (!?!?!?!?!).
It seems like to me that the system was left in a state by the newer kernel (or the older kernel!?) that would allow it to boot. I am not sure what this means, but it doesn’t seem good.
AMD is aware of the issue and investigating. I am really hoping this is a quick kernel tweak or firmware update. I will be surprised if it is a kernel bug, unless it is one that AMD introduced themselves a while back a la the encrypted-virtual-memory-can’t-boot firmware update bug.
Some boards have options for enabling/disabling SEV in UEFI. I played with those options, though not exhaustively, and didn’t make any progress.
It should be possible to update your system in another computer to get past the 5.0.9 problem on Fedora, but I’m not 100% sure about the steps to reproduce here.
In two update scenarios on the 3900x, systemd took 4 of my 12 CPUs offline for mysterious reasons. Echoing 1 into online
under /sys brought it right back.
Stay tuned, and watch this space for updates!
Big thanks to our Patrons for helping fund this work! patreon.com/level1
update 7/8
Sooky journal entry – if you boot into windows then reboot into Fedora 30 it then works fine. What cpu instructions are non deterministic?
If you press the hard reset button it still works
Turning the machine off for < 30 seconds it still works
More than 30 seconds reproduces the fedora issue.
random.trust_cpu=0
doesn’t seem to help the boot situation. Nor does nordrand
However I’m still suspicious of rdand because it might not work if there isn’t enough entropy yet… I guess …
Workaround that works:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1835809
Patch to systemd and everything boots again. This also explains why in some scenarios it was possible to boot a bugged distro. Rdrand had already been initialized.
Systemd is userland so kernelparams like nordrand are ignored. I feel like systemd should pay attention to nordrand… Hmm…