Cluster/Datacenter Statistic Packages for Ubuntu?

Windows7ge · March 14, 2022, 7:16pm

I have a number of Ubuntu/Debian based network clients. Some based off diskless network boot, some hosting hypervisors.

What I’d like to do is find package software that is relatively simple to use for monitoring the resources available/in use by the cluster.

If that doesn’t make sense I’m looking for something similar to what PROXMOX or VMware vCenter use to display online/offline network nodes and cumulative resources like network wide number of CPUs/RAM/Storage/etc.

I’m fine with CLI or WebUI solutions if any packages exist for either. Every node is running Ubuntu/Ubuntu Server so I should be able to install client software to all of them and monitor their status from one location.

Thanks!

Dutch_Master · March 14, 2022, 7:49pm

I’m using Webmin, but that may not suit (all) your needs/requirements.

Windows7ge · March 14, 2022, 9:01pm

The least I can do is investigate it but I think I can already see what you mean.

Windows7ge · March 17, 2022, 12:42am

Yeah Webmin was a little outside the scope of what I was looking for. I found something a little better though I’ll admit it’s still not exactly what I wanted. Cockpit. Would you happen to be familiar with it?

I remember learning about it when I was researching VFIO and GPU pass-through. It doesn’t display “datacenter” wide resource count but it does provide graphs for each client’s utilization and online/offline status. Similar to Webmin but a little more what I was looking for. Will likely stick with it since the setup was fairly easy minus a handful of hickups.

ThatGuyB · March 17, 2022, 2:40am

Do you plan to monitor their resources individually on each system? Then something like Monit or other local monitoring software, like
traffic_totals or vnstat may be worth checking.

My recommendation would be to install (prometheus) node-exporter / node_exporter and set a prometheus+grafana server to see all the stats in a nice GUI. Takes about 1 hour if you don’t know what you’re doing (and about 15 if you know). Here’s what a graph for a single server looks like:

In the graph above, you can select multiple servers to be shown side by side.

(ignore the fact that pfsense node_exporter doesn’t show the RAM usage).

And this one is another graph for the first server, but with more data being shown:

You need just a few steps to do for the above:

Make a linux server on which you install prometheus-server and grafana
Install node-exporter on clients (and maybe on the server itself too)
Edit the server config to point to each client’s hostname and port (default 9100)
Restart the server
Open grafana
Add a data source (prometheus)
Add the 2 graphs (Prometheus Node Exporter Full and Node Exporter Server Metrics)
Point them to the prometheus server as the source

And you’re up and running.

ThatGuyB · March 17, 2022, 2:42am

Cockpit is more like a management interface than anything else. You can have graphs and statistics, but it’s not really its purpose.

You want a NMS (network monitoring system). Prometheus combined with Grafana is a good choice. An alternative would be Zabbix or Centreon. They are a bit harder to setup than Prometheus and Grafana, IMO.

Windows7ge · April 2, 2022, 4:17pm

Sorry, I’ve been very busy and haven’t had the time to come back to this project.

The desire is indeed to have a single centralized interface where I can monitor the status and statistics of multiple nodes (or clients).

PROXMOX has a simple, easy to understand interface for it

but I don’t want to install hypervisors on everything because that would double the number of IP’s I have to juggle + installing localized storage instead of booting them all off the network which I like quite a bit more.

When I do find the time I’ll investigate Prometheus + Grafana. I’ve never head or worked with them before so I’ll make some LXC Containers and test it out before committing to implementing it. I may follow up with some questions if I run into issues.