I’ve been running my server with a Ryzen 1600 + 2 Port SFP+ card on an mITX board for some time now, but for whatever reason I get a lot of crashes since more recent BIOS revisions. Now that I switched to proxmox it crashes like every 3 days… I’ve had this before and foxed it by using the following cmdline argument: processor.max_cstate=5 rcu_nocbs=0-11 .
I will try that later, but the problem that I meant to allude to is that any kind of crash is (more or less) unacceptable without having a way to reset the machine remotely and being able to see the boot screen from afar. I found the pikvm project which is a really nice way to get a ghetto impi, but since I don’t have a gpu, I’m screwed…
Is there any way for me to still get something similar? I mean wouldn’t it be possible to get a usb to ttl adapter and connect it to my pi? Does grub2 support outputting to USBtty? I know that the kernel can. But getting grub to display to it would be a big win, especially when I use a custom kernel or boot arguments.
Does anyone know of a better way? I’ve seen some m.2 GPU’s but they’re really hard to get and I bet also crazy expensive… BTW I unfortunately don’t have enough space for a 1 slot graphics card…
Depending on your workload, you might try getting your hands on a 2400g or a 3400g to replace the 1600. They have built in vega graphics so you could use the HDMI port an your motherboard.
Alternatively, you could also try rolling the bios back to a version you know worked.
^^^ Now that’s the question: why does it crash? Linux is extremely stable and usually Proxmox is too, so there must be something in your config or hardware that triggers a crash so frequently. Turn off your services to see which one causes the crash.
Logs over usb serial, yes - but I’m not sure how helpful it’d be - USB isn’t exactly low level … and even with regular old school io port driven serial sometimes it gets stuck in a way that doesn’t provide output.
What motherboard do you have (asking because many, still to this day have rs232 / 12 V serial COM ports on them in header form that can be used for debugging).
And in the meantime, look into enabling some kind of watchdog, at least it should reboot and not stay stuck.
Sorry, but can’t tell as I only use linux and don’t have a spare drive to test with Windows.
Good idea, but as I’ve reinstalled Proxmox I don’t have any systemd-services activated apart from the normal Proxmox ones. It’s been a problem on fedora server (kernel 5.8) and Centos (kernel 4.18) as well.
Good to know, I’m using an AB350N Gaming WIFI from Gigabyte.
I’ll try out those watchdogs, thanks!
Hmmmm, unbelievably, there’s no com port header mentioned in the manual. Intel has DCI i don’t know about amd, some network gear uses rj45 for serial on special configured ports - maybe there’s something undocumented. I guess you’re screwed in the long run anyway without remote access to bios/uefi since you have your only pcie port populated by the nic.
Wow this looks really interesting, never heard about that before. At least the Proxmox kernel comes with all the necessary kernel dependencies precompiled! Nice, now I only need a USB3 A to A cable…
I don’t yet understand all the watchdogs things, but Proxmox seems to come with one which makes it impossible to install a different one as it would remove necessary Proxmox packages… /dev/watchdog is already populated as well. Would you pointing me into the right direction?
The funniest thing is that till now it didn’t crash. Let’s hope it stays that way…
EDIT: Checked uptime, it restarted 5 hours ago just like that. Wow that’s pretty cool, but sucks that it even comes to that.