Gigabyte R282-Z91 questions

We purchased this server the beginning of last year. It has 2 EPYC 7452 processors and 512GB ram. We were kinda pinched for time so the only think I remember enabling in the bios was multithreading.

Anyway - The server has been running good - currently has proxmox installed and using some older dell servers for HA and replications. The gigabyte server has rebooted a couple of times now. so far it seems to only happen on Saturday morning. weird.

I am wondering what I should start to look at. I have been through the logs and I don’t see anything. I thought it might be a power problem but this sever and one of the dells are on the same ups’s…

I know this if vague - sorry… But I don’t know what I should start looking at.

thanks
sam

could still be ups. It has redundant psus right? if so connect one of the PSUs directly to mains power.

Is the ups connected via usb? the default in some ups software is a ups check saturday AM… maybe related.

The proxmox/linux event logs may havemore info as to what the system was doing before it went down.

could up the fan speed manually in ipmi in case it’s thermal related.

any other status/warnings in the ipmi logs?

1 Like

Wendell!! So love your content. Thank you for all you do… Let me get more specific. There are 2 ups’s each plugged into each psu. (so both would have to fail.) I will go through the logs again and see if I spot anything. This is probably the 3rd time it has happened. (and I need to look at ipmi logs…) Google is my friend… :wink:

also - This machine will go into
Command: /sbin/zpool import -N ‘rpool’
Message: cannot import ‘rpool’ : no such pool available
Error: 1

which is normally solved by adding a rootdelay=10 to the grub line usually fixes this issue (it has fixed it on the other two older servers) But it still stops at initramfs…

I have not tried b) though here
https://pve.proxmox.com/wiki/ZFS:_Tips_and_Tricks#Grub_boot_ZFS_problem

sam

anything helpful in kernel messages (dmesg ) ?

is there anything sensitive in dmesg? I could dump it to a text file and post it here. I have been scrolling though but don’t have the experience…

doubtful but just post the last 100 lines or so, doubt we need it all. maybe IP addresses if that’s in there?

Ugh - I have something going on and it is probably power related. Exactly 7 hours ago - all 4 promox servers rebooted. (2 are dell - one is gigabyte and 1 is just a hp workstation…) 3 are on the same 2 ups’s and one is on a totally different ups…

more research needed…

sam

ok - defiantly not a proxmox issue (yes I was worrying about that) I found another unrelated server that also rebooted at the same time… (old ubuntu running a mdadm array…)

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.