Return to Level1Techs.com

Freenas "unscheduled system reboot"

I have freenas running a dell R710 and I finally realized its been restarting on me almost since the beginning. I would occasionally have my mapped drive to it disconnect. Which was the only symptom I ever really saw.

Finally one day plex (running in a jail on it) wouldn’t respond. I logged into idrac to realize it was rebooting. Software only though. No extra noise from the hardware spinning back up like you get from a cold boot.

Everything I see looks fine except in the debug I can find “unscheduled system reboot”, “The operating system successfully came back online”. I have been through the messages file in /var/logs/messages and cant find anything. Same in idrac. It all checks out as far as I can see.

So my questions are,
What part of the debug should I attach or just attach the whole thing? (I havent been able to find anything other than what I stated above. But this log style is new and confusing to me.)
Any other suggestions on where to look for logs?
And any suggestions on troubleshooting?

1 Like

You can have it send you an email before it were to reboot or when it comes up from a reboot. That way you would know exactly when it occurs to check the logs.

1 Like

First thing i’d be doing is a memory test and/or other hardware tests.

If you have a second system you may also want to consider directing its syslog to that, so you if the problem is storage related, you still get the log over the network.

I’d definitely run hardware diags though, in my experience FreeNAS is pretty solid. I’ve only seen reliability issues from hardware.

1 Like

Ok, I’ll see about getting the logs directed away from that machine. Thats going to suck if its hardware.

I do have a debug and in going through it is where I found the unscheduled restart. I also want to say I saw something panic related but whatever followed it was gibberish to me.

I used to get this when it did a scrub. It basically got too hot (2u chassis and crap fans). Try reapplying thermal paste and check your fans as it may be thermal.

But yes most likely hardware. FreeNAS doesn’t “just crash”

1 Like

As above, it could just be something like thermal paste, RAM not quite seated properly, etc. Not necessarily broken, but could simply be dislodged, overheating, etc.

One of the things that makes me hesitate on hardware is the lack of events in idrac. I figure if freenas was getting hot enough to restart idrac would have events to.

Either way I will setup getting the logs and try to do some hardware testing.

Not suggesting this is your issue, but i have actually had a server with major problems and failing to boot before due to a faulty DELL Drac module (before they were called iDrac).

We had to pull the DRAC out to get it to boot. :smiley:

It could also be that you’re hitting some obscure hardware/driver bug, it would be worth googling for issues with your hardware chipset(s) - particularly SCSI controller - for both FreeNAS and FreeBSD.

So I found this after the last reboot.

<118>Sat Mar 14 18:04:24 PDT 2020
bridge0: Ethernet address: 02:86:b2:da:b4:00
epair0a: Ethernet address: 02:66:10:00:07:0a
epair0b: Ethernet address: 02:66:10:00:08:0b
<5>epair0a: link state changed to UP
<5>epair0b: link state changed to UP
<6>epair0a: changing name to 'vnet0:1'
<6>bce0: promiscuous mode enabled
<5>bridge0: link state changed to UP
<6>vnet0:1: promiscuous mode enabled
<6>arp: 172.16.123.51 moved from f0:9f:c2:c3:e4:12 to 00:0c:29:fa:c6:72 on bce0
MCA: Bank 5, Status 0xbe00000000800400
MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000004
MCA: Vendor "GenuineIntel", ID 0x106a5, APIC ID 16
MCA: CPU 0 UNCOR PCC internal timer error
MCA: Address 0x806577869
MCA: Misc 0x0
panic: Unrecoverable machine check exception

It’s either something with the cpu or the socket. From my experience it’s usually the socket. Please reseat the cpu.

That or try googling the error. Now that I think about that’s probably the best bet.

Most say its a hardware problem but I did find an instance where someone said a microcode update packaged with the bios update fixed the problem. So I downloaded the bios update from dell and applied it.

If that doesnt work I’ll see about pulling it out and reseating the cpu. Its not easy to get it from its current physical location to a point where I can work on it.

And if that doesnt work I’ll probably pull that cpu and just have one in it. As it is a dual cpu system. Relocate the second one to the primary slot and move the RAM.