This seemed like something good to post in here and I hadn’t seen a topic around it specifically recently. I have the normal “A/C power has been lost/restored” notifications setup as well as high resource usage in our hosts and some network notifications for different scenarios. These work great and have saved us from a lot of pain especially after hours and on weekends but we’ve had HVAC issues in our main server room and I want to know what solutions others have used.
I feel like I just talked myself out of posting this realizing I might just end up with a slew of off the shelf options that work really well that I already have looked at. Anyway I’d like some opinions of some other internet IT folk. I’m open to cheap amazon china options and larger more accurate enterprise solutions. Ideally I’d like email or text notifications for at least 3 people when a temp threshold is breached.
Most servers, UPSes, SANs, managed switches, etc., have ambient temperature sensors built-in which can be monitored via SNMP or other agents.
Normally you’d just configure those to be watched by your existing network monitoring system (Nagios, Checkmk, Zabbix, etc.) with appropriate temperature WARN / CRIT thresholds configured. NMSes are smart about e-mailing, not sending superious alerts, and can be set with several levels of contacts to escalate to if an issue is not acknowledged within a time limit, and more.
So, do you think you need stand-alone temperature/humidity (environmental) sensors for some reason, or will monitoring your existing equipment be adequate?
For servers, you may just want to enable SNMP on the LOM / OBM, or you may want to install the management software (Dell OpenManage, HP OneView, Supermicro Server Manager) so the system SNMP or other agents running on the host OS will see hardware health info.
for “home prod” i am also using one of these https://www.amazon.com/Innolage-Thermometer-Temperature-Sensor-Recorder/dp/B0785JHNWB/(note that this is a random link, there is likely 50quintillion companies selling these) to monitor the temperature of the room. some quick python can export the data to your monitor solution so you can see if the hvac has kicked on in the last hour or so.
Ah this is exactly what I needed. I don’t have long to write a detailed reply now but all 3 of your replies were super helpful. I had forgotten that we had thermal alerts setup on our nimble but they may be setup incorrectly or not working for another reason. I’m going to look into this when I get a moment. The innolage was something I had thrown around many times as a simple but dirty solution along with some less secure and professional options. The Vertiv Watchdog 15-P is the nice solution I was looking for and actually looks better than a lot of the products that I’ve run across when researching options.
But the project was canceled before equipment was ordered so I don’t have any experience with them… could be sketchy. Be very careful to avoid the two “10BASE-T / 10 Mbps” models, which won’t even work on 10GbE.