Span Nagios to other VLAN

Introduction

Nagios can be configured to monitor a great varity of hosts and services. This section focused on the various configuration of nagios to monitor the system in the network.
 
The main configuration files for nagios are:
 
Main configuration file: Main configuration file is located at /usr/local/nagios/etc/nagios.cfg This file is read both by nagios demon and CGIs and affect their operation.
Resource file: The resource file is located at /usr/local/nagios/etc/resource.cfg. It can be used to install user defined macros and sensitive information such as passwords, with out making them available to the CGIs
Object definition files: Oject definition files are located under /usr/local/nagios/etc/objects/ folder. The objects files define hosts,services,hostgroups,contacts,contact groups and commands etc. All the devices that need monitoring should be defined here.
CGI configuration files: The CGI configuration file is located at /usr/local/nagios/etc/cgi.cfg. It contains serveral directives that affect the operation of CGIs and it contains a copy of the main configuration file so that the CGIs know how the nagios daemon is configured.
 

Define Hosts, Services and Contacts

All the hosts and services that needs monitoring must be defined in nagios. This is done by define them in the object definition files. If nagios is installed in the way described above, then a number of sample definition files should be already in the  /usr/local/nagios/etc/objects/ folder. They can be used as template for defining various devices and services. The contact information for administrators are also defined in object files. If not specified, all the files mentioned in this section is located in the /usr/local/nagios/etc/objects/ folder.
 
View nagios documentation for detailed configuration guidelines.

Define local hosts and services

In the testing Local hosts and services can are defined in the localhost.cfg file. Actually, they can be defined in any file that is defined in the main configuration files. This allows multiple ways of managing the definition.



The Hosts and Services defined in sample configuration

In the following sample configuration the nagios installed on drrobbins is configured to monitoring the remote host sophia and its ssh service. The definition allows inheritance from templates by "use" directive. Refer to templates.cfg to obtain the complete definitions of the objects. Note that if there are repetition of directives in the definition and the template, the ones in the definition would have higher precedence.

define host{
        use  linux-server       ; Name of host template to use
                                ; This host definition will inherit all variables that are defined
                                ; in (or inherited by) the linux-server host template definition.
        host_name  sophia.blueprint.org
        alias      sophia
        address    sophia.blueprint.org
        hostgroups linux-servers
        }



define service{
    use            generic-service
    host_name        sophia
    service_description    SSH
    check_command        check_ssh
    }

Monitor remote linux machines

Sometimes, the status such as disk usage of remote machines needs to be monitored. There are two ways to do that - by SSH connection of the nagios server, or by installing NRPE daemon on the machine that needs monitoring. The following demonstrate how to use NRPE to monitor the status of a remote machine.
 
Configure remote machine
In the first step, the nagios plugins must be installed on the remote host so that all the checks commands will working on the remote host.
 
useradd nagios
passwd nagios
./configure
make
make install
chown nagios:nagios /usr/local/nagios
chown nagios:nagios /usr/local/nagios/libexec
 
Next is to install xinetd using yum.
 
Next install nrpe by the following command:
 
tar xzf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
make install-plugin
make install-daemon
make install-daemon-config
make install-xinetd
 
Edit the /etc/xinetd.d/nrpe file and add the IP address of the monitoring server to the only_from directive.
 
only_from        = 10.0.11.5
 
Add the following entry for the NRPE daemon to the /etc/services file.
 
nrpe            5666/tcp                #NRPE
 
Now restart xinetd service
 
service xinetd restart
 
If everything worked a netstat -st | grep nrpe should output the following:
 
tcp            0            0    *:nrpe    *:*            LISTEN
 
Now open up the firewall for nrpe service
 
iptables -I INPUT -p tcp -m --dport 5666 -j ACCEPT
 
or, if using cetos/Redhat/fedora
 
iptables -I RH-Firewall-1-INPUT -p tcp -m tcp -dport 5666 -j ACCEPT
 
Note: After setting up the only_from directive for xinetd, the local nrpe plugin would not be able to inspect the setting of local files.
 
Configure the nagios server
 
On the nagios server the nrpe plugin needs to be installed.
 
tar xzf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
make install-plugin
 
Now we can check if NRPE is working by
 
/usr/local/nagios/libexec/echeck_nrpe -H sophia
 
If everything works, the nrpe version would pop out.
 
Next step is to define the nrpe command in the object file commands.cfg

define command{
    command_name    check_nrpe
    command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c  $ARG1$
    }
 
Sample commands that can be used by NRPE are

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200

For security reasons the nrpe should not be configured to allow command arguments unless absolute nessessay.

The following sample services were added in the testing to monitor vavious aspects of the remote machine sophia.

define service{
    use            generic-service
    host_name        sophia.blueprint.org
    service_description    SSH
    check_command        check_ssh
    }

define service{
    use            generic-service
    host_name        sophia.blueprint.org
    service_description    CPU Load
    check_command        check_nrpe!check_load
    }

define service{
    use            generic-service
    host_name        sophia.blueprint.org
    service_description    Current Users
    check_command        check_nrpe!check_users
    }

define service{
    use            generic-service
    host_name        sophia.blueprint.org
    service_description    /dev/hda1 Free Space
    check_command        check_nrpe!check_hda1
    }

define service{
    use            generic-service
    host_name        sophia.blueprint.org
    service_description    Total Processes
    check_command        check_nrpe!check_total_procs
    }

define service{
    use            generic-service
    host_name        sophia.blueprint.org
    service_description    Zombie Processes
    check_command        check_nrpe!check_zombie_procs
    }
 

Monitor remote windows machines

Windows machines can be monitored through nagios by installing a monitoring daemon NSclient++.
 
Install NSclient++ on Windows machine
 
NSclient++ can be downloaded from http://trac.nakednuns.org/nscp/. To install NSclient++ extract the zip file to C:\nsclient and type:
 
nsclient /install
 
Open the services manager and make sure the NSClientpp service is allowed to interact with the desktop (see the ’Log On’ tab of the services manager). If it isn’t already allowed to interact with the desktop, check the box to allow it to.
 
Edit the NSC.INI file (located in the C:\NSClient++ directory) and make the following changes:
  • Uncomment all the modules listed in the [modules] section, except for CheckWMI.dll and RemoteConfiguration.dll
  • Optionally require a password for clients by changing the ’password’ option in the [Settings] section.
  • Uncomment the ’allowed_hosts’ option in the [Settings] section. Add the IP address of the Nagios
  • server to this line, or leave it blank to allow all hosts to connect.
  • Make sure the ’port’ option in the [NSClient] section is uncommented and set to ’12489’ (the default port).
 Now starts the nsclient service by:
nsclient++ /start
To configure the nagios server, just include windows.cfg in the nagios configuration and make changes to windows.cfg using the template.
 
 

Monitor routers and switches

If a router/switch is visible in the network (i.e. it has an IP address), its status can be monitored by nagios. A sample configuration file (switches.cfg) has already been installed in the objects folder.

The most basic monitoring to a router is a check_ping command. This allows monitoring information such as package loss and uptime. In addition, if the router supports snmp, more advanced monitering such as ports and band width can be achieved. In the following example Nagios is configured to moniter a Linksys Wrt350 router (ecklie) with dd-wrt system.

1. SNMP should be enabled on the router. For ecklie, goto Services->Services and on the SNMP section, click enable. And there will be additional blocks pop out. Change the Name to ecklie(or the name of the router) and click save at the bottom of the page then click apply.

2. On the nagios server, the nagios plugins must configure with net-snmp and net-snmp-utils package to get the check-snmp plugin installed. And also install wrgt package for monitoring the bandwidth.

Configure mrgt:
wrgt (oss.oetiker.ch/mrtg) is a plotter for the the network usage. In Centos 5.1 wrgt can be installed through yum.
The format for mrgt is mrtg <mrtgcfgfile>, wheremartgcfgfile is the configuration file for mrtg. A cfg generation tool called cfgmaker is installed with mrtg and it can be used to generate the config file for mrtg to use.
 
cfgmaker --global 'WorkDir: /home/nagios/mrtg' --global 'Options[_]: growright,bits' --ifref=ip public@10.0.11.1
 
The --ifref parameter is important because it sets the chriteria that differenciates the ports, other values include nr,ip,eth,descr,name,type.
 
copy the file to /home/nagios/mrtg folder and change the owner to nagios
 
The mrtg has to be executed continuously, add the following lines to the crontab -e of usr nagios so that mrtg executes every 3 minutes:
 
*/3 * * * * env LANG=C /usr/bin/mrtg /home/nagios/mrtg.cfg --logging /var/log/mrtg.log
 
After the above configuration, one mrtg is excuted, logfiles for the bandwiths of defferent ports of the router will be generated in the /home/nagios/mrtg folder and the filename is in the format of <router address>_<port address>.log this can be used in the check_local_mrtgtraf plugin for checking bandwith of different ports.
 
 
3. Define the router and services in the switch.cfg file and make nagios to read this file in nagios.cfg. The configuration used in this is the  following:

define host{
    use        generic-switch        ; Inherit default values from a template
    host_name    ecklie        ; The name we're giving to this switch
    alias        Linksys WRT350 Switch    ; A longer name associated with the switch
    address        10.0.11.1        ; IP address of the switch
    hostgroups    switches        ; Host groups this switch is associated with
    }
 
define hostgroup{
    hostgroup_name    switches        ; The name of the hostgroup
    alias        Network Switches    ; Long name of the group
    }
define service{
    use            generic-service    ; Inherit values from a template
    host_name        ecklie    ; The name of the host the service is associated with
    service_description    PING        ; The service description
    check_command        check_ping!200.0,20%!600.0,60%    ; The command used to monitor the service
    normal_check_interval    5        ; Check the service every 5 minutes under normal conditions
    retry_check_interval    1        ; Re-check the service every minute until its final/hard state is determined
    }
define service{
    use            generic-service    ; Inherit values from a template
    host_name        ecklie
    service_description    Uptime   
    check_command        check_snmp!-C public -o sysUpTime.0
    }
define service{
    use            generic-service    ; Inherit values from a template
    host_name        ecklie
    service_description    Port 1 Link Status
    check_command        check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB
    }
define service{
    use            generic-service    ; Inherit values from a template
    host_name        ecklie
    service_description    Port 1 Bandwidth Usage
    check_command        check_local_mrtgtraf!/var/lib/mrtg/10.0.11.1_1.log!AVG!1000000,1000000!5000000,5000000!1
    }

Monitor services

Nagios supports monitoring of a wide variety of network services through the use of pugins. This section provides guide for setting up nagios to monitor common services in the lab:

LDAP service

LDAP is implemented in the lab using openldap, which provides central authentication for the machines in the lab. To monitor the status of ldap service,  check_ldap plugin is used. The options for check_ldap are:

Options:
 -h, --help
    Print detailed help screen
 -V, --version
    Print version information
 -H, --hostname=ADDRESS
    Host name, IP Address, or unix socket (must be an absolute path)
 -p, --port=INTEGER
    Port number (default: 389)
 -4, --use-ipv4
    Use IPv4 connection
 -6, --use-ipv6
    Use IPv6 connection
 -a [--attr]
    ldap attribute to search (default: "(objectclass=*)"
 -b [--base]
    ldap base (eg. ou=my unit, o=my org, c=at
 -D [--bind]
    ldap bind DN (if required)
 -P [--pass]
    ldap password (if required)
 -T [--starttls]
    use starttls mechanism introduced in protocol version 3
 -S [--ssl]
    use ldaps (ldap v2 ssl method). this also sets the default port to 636
 -2 [--ver2]
    use ldap protocol version 2
 -3 [--ver3]
    use ldap protocol version 3
    (default protocol version: 2)
 -w, --warning=DOUBLE
    Response time to result in warning status (seconds)
 -c, --critical=DOUBLE
    Response time to result in critical status (seconds)
 -t, --timeout=INTEGER
    Seconds before connection times out (default: 10)
 -v, --verbose
    Show details for command-line debugging (Nagios may truncate output)

The command defined for checking ldap in the commands.cfg file

define command{
    command_name    check_ldap
    command_line    $USER1$/check_ldap -H $HOSTADDRESS$ -3 -4 -T -b $ARG1$ -w $ARG2$ -c $ARG3$
    }

And on sophia.cfg add:

define service{
    use            generic-service
    host_name        sophia.blueprint.org
    service_description    check the status of ldap service
    check_command        check_ldap!dc=blueprint,dc=org!3!10
    }

After the configuration, the status of ldap service can be monitored.

Setting up notifications

Automatically notify admin when something goes wrong is a important feature for naigos. The threshold for nagios to generate a notification is usually defined as parameters for the check command. For more information on the detailed parameters for individual check command, refer to the nagios documentation.

The contacts and contact groups are defined in the contacts.cfg file. Contact groups are usful for define the scope of the notification. There are many communication methods can be used to receive nagios notifications, such as email, pager and instant message. The following guide will set up a email server so that nagios can send out notification by email.

By default nagios uses the bsd mail command to send out emails which uses mail servers such as sendmail or postfix. In Centos5.1 the sendmail server is set up by default. It just need to change one line in the /etc/mail/sendmail.mc change the line:
 
DAEMON_OPTIONS('Port=smtp,Addr=127.0.0.1,Name=MTA')dnl
to:
DAEMON_OPTIONS('Port=smtp,Name=MTA')dnl
After this setting remake the sendmail.cf file by make command in the /etc/mail folder, and restart sendmail service. Now nagios would be able to send out notification to local machines and nus email addresses 
 
To enable sendmail to send notification email to public email addresses. More configuration is needed. See Setting up SMTP relay for blueprint.org on centos page for details.
Comments