17 February 2016
Nagios is extremely versatile, and can monitor just about anything. I first tasted Nagios when I worked as an Operations Intern for a Drupal services.
In today’s post, I’m going to share some of my accumulated knowledge in using Nagios to monitor the infrastructure we manage through Barred Owl Web. In December 2015, I gave a presentation to the ChaDevOps Meetup Group on a Basic Introduction to Nagios. You can view all of my workshops & presentations at https://barredowlweb.com/knowledge-base/
Up until recently, I only used Nagios to monitor public services (namely, does a URL properly load, and is the server responsive to ICMP pings). Within the last 2 months, I’ve expanded my basic Nagios implementation to using NRPE for monitoring server load, memory usage, and postfix mail queues on various servers.
The Setup
As of this blog post, I run all of my infrastructure on CentOS. Most of the servers I manage are running either CentOS 6 or 7, although I still have a couple legacy CentOS 5 machines under my control. Instead of compiling Nagios from source (who wants to maintain that?), I’ve opted to use the EPEL repository.
Here’s my setup:
- EPEL Repo (For CentOS 7, you can install it with `rpm -iUvh http://ftp.linux.ncsu.edu/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm`)
- After you do a `yum install nagios nagios-plugins-all nagios-nrpe`, you can find the relevant Nagios files as follows:
- Main config and conf.d directory is in /etc/nagios/
- Plugins are located in /usr/lib64/nagios/plugins
- NRPE config is at /etc/nagios/nrpe.conf
The Monitoring
Here’s some of the things that I’m monitoring:
- Checking for correct DNS values on various hosts
- check_dns -H host [-s server] [-a expected-address] [-A] [-t timeout] [-w warn] [-c crit] — http://nagios-plugins.org/doc/man/check_dns.html
- This doesn’t require NRPE, and is a simple check from the monitoring server. Here’s my service definition:
define service{ host_name ns1.developcents.com service_description DNS Check check_command check_dns!ns1.developcents.com contact_groups admins max_check_attempts 3 check_interval 10 retry_interval 5 check_period 24×7 notification_interval 30 notification_period 24×7}
- Checking to see if server load is reasonable
- check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15 — http://nagios-plugins.org/doc/man/check_load.html
- This does require NRPE. Here’s my service definition on the monitoring server:
define service{ host_name mail.developcents.com service_description Server Load contact_groups admins check_command check_nrpe!check_load check_interval 4 retry_interval 1 max_check_attempts 3 check_period 24×7 notification_period 24×7 }
- And here’s my NRPE command (found in nrpe.conf) on the server that is being monitored:command[check_load]=/usr/lib64/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
- Checking the Mail Queue to make sure it’s not clogged
- This is a 3rd party plugin not included in the default nagios-plugins-all package provided by EPEL. The plugin information is at https://exchange.nagios.org/directory/Plugins/Email-and-Groupware/Postfix/check_postfix_queue/details.
- Here’s my service definition on the monitoring server:
define service{ host_name mail.developcents.com service_description Mail Queue contact_groups admins check_command check_nrpe!check_queue check_interval 4 retry_interval 1 max_check_attempts 3 check_period 24×7 notification_period 24×7 }
- And here’s my NRPE command (again, note that this goes into nrpe.conf on the server that is actually being monitored):command[check_queue]=/usr/lib64/nagios/plugins/check_postfix_queue -w 15 -c 30
I hope that this information is useful to someone! You can also find some of my Nagios-related questions & answers on ServerFault and StackOverflow:
- My Question and answer on how to monitor URLs: http://stackoverflow.com/questions/9246557/monitoring-urls-with-nagios/
- My Question and answer on how to monitor hosts with check_ping: http://stackoverflow.com/questions/26746404/nagios-monitoring-hosts-with-check-ping
- My Answer to How to run a check from the CLI: http://serverfault.com/questions/339968/how-can-i-manually-run-a-nagios-check-from-the-command-line/339969#339969 (See my answer)
Want to share some of your Nagios knowledge? Leave a comment.
Want me to help you with your Nagios – or other sysadmin – needs? Contact us today.