We purchased this server the beginning of last year. It has 2 EPYC 7452 processors and 512GB ram. We were kinda pinched for time so the only think I remember enabling in the bios was multithreading.
Anyway - The server has been running good - currently has proxmox installed and using some older dell servers for HA and replications. The gigabyte server has rebooted a couple of times now. so far it seems to only happen on Saturday morning. weird.
I am wondering what I should start to look at. I have been through the logs and I don’t see anything. I thought it might be a power problem but this sever and one of the dells are on the same ups’s…
I know this if vague - sorry… But I don’t know what I should start looking at.
Wendell!! So love your content. Thank you for all you do… Let me get more specific. There are 2 ups’s each plugged into each psu. (so both would have to fail.) I will go through the logs again and see if I spot anything. This is probably the 3rd time it has happened. (and I need to look at ipmi logs…) Google is my friend…
also - This machine will go into
Command: /sbin/zpool import -N ‘rpool’
Message: cannot import ‘rpool’ : no such pool available
Error: 1
which is normally solved by adding a rootdelay=10 to the grub line usually fixes this issue (it has fixed it on the other two older servers) But it still stops at initramfs…
Ugh - I have something going on and it is probably power related. Exactly 7 hours ago - all 4 promox servers rebooted. (2 are dell - one is gigabyte and 1 is just a hp workstation…) 3 are on the same 2 ups’s and one is on a totally different ups…
ok - defiantly not a proxmox issue (yes I was worrying about that) I found another unrelated server that also rebooted at the same time… (old ubuntu running a mdadm array…)