Server becomes unresponsive

Mr_Pyro · May 19, 2020, 11:01pm

Hi,

I have a server hosting my website. It runs fine usually for about a week or so before it becomes unresponsive. When I try to view my website in a browser it does not load. When I try to ssh into it, it does not respond with anything just hangs (have to ctrl+c from ssh command). When I restart the server it starts working again just fine. I created a system utilization logging script and this is what it has (server restarted today 5/19/20 and became unresponsive 5/18/20):

[05/18/20|00:58:01]----------------------------
Memory Usage: 1466/7960MB (18.42%)
CPU Load: 0.00
Internet: Good!
-----------------------------------------------
[05/18/20|00:59:01]----------------------------
Memory Usage: 1466/7960MB (18.42%)
CPU Load: 0.06
Internet: Good!
-----------------------------------------------
[05/18/20|01:00:01]----------------------------
Memory Usage: 1470/7960MB (18.47%)
CPU Load: 0.05
Internet: Good!
-----------------------------------------------
[05/18/20|01:01:01]----------------------------
Memory Usage: 1470/7960MB (18.47%)
CPU Load: 0.02
Internet: Good!
-----------------------------------------------
[05/19/20|15:57:02]----------------------------
Memory Usage: 1509/7960MB (18.96%)
CPU Load: 1.49
Internet: Good!
-----------------------------------------------
[05/19/20|15:58:01]----------------------------
Memory Usage: 1508/7960MB (18.94%)
CPU Load: 0.58
Internet: Good!
-----------------------------------------------

It just seems to stop logging at the same time it becomes unresponsive and with no indication of mem leak or internet issue. Any ideas?

Mr_Pyro · May 19, 2020, 11:04pm

OS: Pop!_OS 18.04 LTS x86_64
Kernel: 5.3.0-7648-generic
Uptime: 7 mins
Packages: 1924
Shell: bash 4.4.20
Terminal: /dev/pts/0
CPU: AMD Ryzen 5 1600 (12) @ 3.200GHz
GPU: NVIDIA GeForce GTX 1050 Ti
Memory: 1516MiB / 7960MiB

Also, I currently have it restarting every 2hrs in an attempt to fix the issue. Does not fix the issue.

Eden · May 19, 2020, 11:05pm

Is it a VM?

Whats the output of df -h

Mr_Pyro · May 19, 2020, 11:07pm

No, its my old desktop (not a VM).

Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           797M  1.7M  795M   1% /run
/dev/sdc3       102G   11G   86G  11% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/sdc1       953M  174M  779M  19% /boot/efi
/dev/sda1       220G   60M  208G   1% /mnt/sda1
/dev/sdb1       1.8T  5.4G  1.7T   1% /home
tmpfs           797M   28K  796M   1% /run/user/118
tmpfs           797M     0  797M   0% /run/user/1000

SgtAwesomesauce · May 19, 2020, 11:43pm

I’d start by looking for kernel dumps or systemd logs.

Can you attach a display and monitor dmesg from a pty? I suspect something is going awry and you’ll catch it in the logs.

Are you patching the system?

regulareel · May 19, 2020, 11:57pm

Any particular reason why PopOS is the chosen server? You’d think Ubuntu Server Minimal or Debian Server Minimal would be best if you want ubuntu/debian based.

Mr_Pyro · May 20, 2020, 12:09am

Nothing in journalctl:

May 18 01:01:01 pop-os CRON[2570]: pam_unix(cron:session): session opened for user root by (uid=0)
May 18 01:01:01 pop-os CRON[2572]: (root) CMD (/home/<UserName>/Web/SysUtilLogger/runLogger.sh)
May 18 01:01:01 pop-os CRON[2570]: pam_unix(cron:session): session closed for user root
-- Reboot --
May 19 15:56:20 pop-os kernel: Linux version 5.3.0-7648-generic (buildd@lcy01-amd64-012)

Not sure how to check for kernel dumps. “pty”? I can’t leave a monitor connected. I was planning on connecting a monitor while it was unresponsive, but i forgot and i will have to wait a week before i will have a chance again. “Are you patching the system?” I ssh in and run apt every once in a while, if that is what you mean.

Mr_Pyro · May 20, 2020, 12:11am

Not really, I just prefer pop. And at this point i’m not going to change it (would be too much of a pita). It is also not likely to be a pop specific issue.

SgtAwesomesauce · May 20, 2020, 12:19am

That’s exactly what I mean. Good.

I definitely missed that it’s running pop.

Pop isn’t meant to be a server OS. I can’t speak to the stability of it’s customizations for server workloads. That said, it shouldn’t be happening, regardless.

Pseudo terminal. the thing you get from ctrl+alt+F3, etc…

Ah, that’s frustrating. Maybe leave an ssh session open watching dmesg?

Mr_Pyro · May 20, 2020, 12:27am

just realized that i might actually be able to have it hooked up to a monitor. just need to find my dvi cable. my two monitors are currently connected to my desktop, but the dvi on my second monitor is open.

zlynx · May 20, 2020, 1:24am

I’ve had systems with your symptoms and it was because of a root drive failure. It could be running along just fine for the most part until it hit a series of bad blocks. Then the drive would become so unresponsive that the entire system would fail. And of course it couldn’t log anything because the drive wasn’t talking.

Check the SMART data on all of your drives. See if you can get a remote network log going so logging doesn’t depend on the hard drive working.

Mr_Pyro · May 23, 2020, 8:07pm

Finally failed today. Nothing new to report (no display out). “SMART data” and “remote network log” how?

SgtAwesomesauce · May 23, 2020, 8:12pm

rsyslog comes to mind.

Other options include the very crude: tail -f /var/log/syslog | ssh $OTHER_MACHINE cat > ~/machine_name_log

https://wiki.archlinux.org/index.php/S.M.A.R.T.

(yes, I know it’s not arch, but the concepts still apply)

I’d 100% do a SMART test just to be sure.

Mr_Pyro · May 24, 2020, 9:28am

Ran the smart tests on all of the drives. All of them “Completed without error”.