Monitoring Linux Systems with Nagios

Feb 17, 2016 | Barred Owl Web, Technical

17 February 2016

Nagios is extremely versatile, and can monitor just about anything. I first tasted Nagios when I worked as an Operations Intern for a Drupal services.

In today’s post, I’m going to share some of my accumulated knowledge in using Nagios to monitor the infrastructure we manage through Barred Owl Web. In December 2015, I gave a presentation to the ChaDevOps Meetup Group on a Basic Introduction to Nagios. You can view all of my workshops & presentations at https://barredowlweb.com/knowledge-base/

Up until recently, I only used Nagios to monitor public services (namely, does a URL properly load, and is the server responsive to ICMP pings). Within the last 2 months, I’ve expanded my basic Nagios implementation to using NRPE for monitoring server load, memory usage, and postfix mail queues on various servers.

The Setup

As of this blog post, I run all of my infrastructure on CentOS. Most of the servers I manage are running either CentOS 6 or 7, although I still have a couple legacy CentOS 5 machines under my control. Instead of compiling Nagios from source (who wants to maintain that?), I’ve opted to use the EPEL repository.

Here’s my setup:

  • EPEL Repo (For CentOS 7, you can install it with `rpm -iUvh http://ftp.linux.ncsu.edu/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm`)
  • After you do a `yum install nagios nagios-plugins-all nagios-nrpe`, you can find the relevant Nagios files as follows:
    • Main config and conf.d directory is in /etc/nagios/
    • Plugins are located in /usr/lib64/nagios/plugins
    • NRPE config is at /etc/nagios/nrpe.conf

The Monitoring

Here’s some of the things that I’m monitoring:

  • Checking for correct DNS values on various hosts
    • check_dns -H host [-s server] [-a expected-address] [-A] [-t timeout] [-w warn] [-c crit] — http://nagios-plugins.org/doc/man/check_dns.html
    • This doesn’t require NRPE, and is a simple check from the monitoring server. Here’s my service definition:

      define service{     host_name ns1.developcents.com     service_description DNS Check     check_command check_dns!ns1.developcents.com     contact_groups admins     max_check_attempts 3     check_interval 10     retry_interval 5     check_period 24×7     notification_interval 30     notification_period 24×7}

  • Checking to see if server load is reasonable
    • check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15 — http://nagios-plugins.org/doc/man/check_load.html
    • This does require NRPE. Here’s my service definition on the monitoring server:

      define service{ host_name mail.developcents.com service_description Server Load contact_groups admins check_command check_nrpe!check_load check_interval 4 retry_interval 1 max_check_attempts 3 check_period 24×7 notification_period 24×7 }

    • And here’s my NRPE command (found in nrpe.conf) on the server that is being monitored:command[check_load]=/usr/lib64/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
  • Checking the Mail Queue to make sure it’s not clogged
    • This is a 3rd party plugin not included in the default nagios-plugins-all package provided by EPEL. The plugin information is at https://exchange.nagios.org/directory/Plugins/Email-and-Groupware/Postfix/check_postfix_queue/details.
    • Here’s my service definition on the monitoring server:

      define service{ host_name mail.developcents.com service_description Mail Queue contact_groups admins check_command check_nrpe!check_queue check_interval 4 retry_interval 1 max_check_attempts 3 check_period 24×7 notification_period 24×7 }

    • And here’s my NRPE command (again, note that this goes into nrpe.conf on the server that is actually being monitored):command[check_queue]=/usr/lib64/nagios/plugins/check_postfix_queue -w 15 -c 30

I hope that this information is useful to someone! You can also find some of my Nagios-related questions & answers on ServerFault and StackOverflow:

  • My Question and answer on how to monitor URLs: http://stackoverflow.com/questions/9246557/monitoring-urls-with-nagios/
  • My Question and answer on how to monitor hosts with check_ping: http://stackoverflow.com/questions/26746404/nagios-monitoring-hosts-with-check-ping
  • My Answer to How to run a check from the CLI: http://serverfault.com/questions/339968/how-can-i-manually-run-a-nagios-check-from-the-command-line/339969#339969 (See my answer)

Want to share some of your Nagios knowledge? Leave a comment.

Want me to help you with your Nagios – or other sysadmin – needs? Contact us today.

Contact Us

P.O. Box 21514
Chattanooga, TN 37424