Introduction
Nagios can be configured to monitor a great varity of hosts and services. This section focused on the various configuration of nagios to monitor the system in the network.
The main configuration files for nagios are:
Main configuration file: Main configuration file is located at /usr/local/nagios/etc/nagios.cfg This file is read both by nagios demon and CGIs and affect their operation.
Resource file: The resource file is located at /usr/local/nagios/etc/resource.cfg. It can be used to install user defined macros and sensitive information such as passwords, with out making them available to the CGIs
Object definition files: Oject definition files are located under /usr/local/nagios/etc/objects/ folder. The objects files define hosts,services,hostgroups,contacts,contact groups and commands etc. All the devices that need monitoring should be defined here.
CGI configuration files: The CGI configuration file is located at /usr/local/nagios/etc/cgi.cfg. It contains serveral directives that affect the operation of CGIs and it contains a copy of the main configuration file so that the CGIs know how the nagios daemon is configured.
Define Hosts, Services and Contacts
All the hosts and services that needs monitoring must be defined in nagios. This is done by define them in the object definition files. If nagios is installed in the way described above, then a number of sample definition files should be already in the /usr/local/nagios/etc/objects/ folder. They can be used as template for defining various devices and services. The contact information for administrators are also defined in object files. If not specified, all the files mentioned in this section is located in the /usr/local/nagios/etc/objects/ folder.
View nagios documentation for detailed configuration guidelines.
Define local hosts and services
In the testing Local hosts and services can are defined in the localhost.cfg file. Actually, they can be defined in any file that is defined in the main configuration files. This allows multiple ways of managing the definition.
The Hosts and Services defined in sample configuration
In the following sample configuration the nagios installed on drrobbins is configured to monitoring the remote host sophia and its ssh service. The definition allows inheritance from templates by "use" directive. Refer to templates.cfg to obtain the complete definitions of the objects. Note that if there are repetition of directives in the definition and the template, the ones in the definition would have higher precedence.
define host{
use linux-server ; Name of host template to use
; This host definition will inherit all variables that are defined
; in (or inherited by) the linux-server host template definition.
host_name sophia.blueprint.org
alias sophia
address sophia.blueprint.org
hostgroups linux-servers
}
define service{
use generic-service
host_name sophia
service_description SSH
check_command check_ssh
}
Monitor remote linux machines
Sometimes, the status such as disk usage of remote machines needs to be monitored. There are two ways to do that - by SSH connection of the nagios server, or by installing NRPE daemon on the machine that needs monitoring. The following demonstrate how to use NRPE to monitor the status of a remote machine.
Configure remote machine
In the first step, the nagios plugins must be installed on the remote host so that all the checks commands will working on the remote host.
useradd nagios
passwd nagios
./configure
make
make install
chown nagios:nagios /usr/local/nagios
chown nagios:nagios /usr/local/nagios/libexec
Next is to install xinetd using yum.
Next install nrpe by the following command:
tar xzf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
make install-plugin
make install-daemon
make install-daemon-config
make install-xinetd
Edit the /etc/xinetd.d/nrpe file and add the IP address of the monitoring server to the only_from directive.
only_from = 10.0.11.5
Add the following entry for the NRPE daemon to the /etc/services file.
nrpe 5666/tcp #NRPE
Now restart xinetd service
service xinetd restart
If everything worked a netstat -st | grep nrpe should output the following:
tcp 0 0 *:nrpe *:* LISTEN
Now open up the firewall for nrpe service
iptables -I INPUT -p tcp -m --dport 5666 -j ACCEPT
or, if using cetos/Redhat/fedora
iptables -I RH-Firewall-1-INPUT -p tcp -m tcp -dport 5666 -j ACCEPT
Note: After setting up the only_from directive for xinetd, the local nrpe plugin would not be able to inspect the setting of local files.
Configure the nagios server
On the nagios server the nrpe plugin needs to be installed.
tar xzf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
make install-plugin
Now we can check if NRPE is working by
/usr/local/nagios/libexec/echeck_nrpe -H sophia
If everything works, the nrpe version would pop out.
Next step is to define the nrpe command in the object file commands.cfg
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
Sample commands that can be used by NRPE are
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
For security reasons the nrpe should not be configured to allow command arguments unless absolute nessessay.
The following sample services were added in the testing to monitor vavious aspects of the remote machine sophia.
define service{
use generic-service
host_name sophia.blueprint.org
service_description SSH
check_command check_ssh
}
define service{
use generic-service
host_name sophia.blueprint.org
service_description CPU Load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name sophia.blueprint.org
service_description Current Users
check_command check_nrpe!check_users
}
define service{
use generic-service
host_name sophia.blueprint.org
service_description /dev/hda1 Free Space
check_command check_nrpe!check_hda1
}
define service{
use generic-service
host_name sophia.blueprint.org
service_description Total Processes
check_command check_nrpe!check_total_procs
}
define service{
use generic-service
host_name sophia.blueprint.org
service_description Zombie Processes
check_command check_nrpe!check_zombie_procs
}
Monitor remote windows machines
Windows machines can be monitored through nagios by installing a monitoring daemon NSclient++.
Install NSclient++ on Windows machine
nsclient /install
Open the services manager and make sure the NSClientpp service is allowed to interact with the desktop (see the ’Log On’ tab of the services manager). If it isn’t already allowed to interact with the desktop, check the box to allow it to.
Edit the NSC.INI file (located in the C:\NSClient++ directory) and make the following changes:
- Uncomment all the modules listed in the [modules] section, except for CheckWMI.dll and RemoteConfiguration.dll
- Optionally require a password for clients by changing the ’password’ option in the [Settings] section.
- Uncomment the ’allowed_hosts’ option in the [Settings] section. Add the IP address of the Nagios
- server to this line, or leave it blank to allow all hosts to connect.
- Make sure the ’port’ option in the [NSClient] section is uncommented and set to ’12489’ (the default port).
Now starts the nsclient service by:
nsclient++ /start
To configure the nagios server, just include windows.cfg in the nagios configuration and make changes to windows.cfg using the template.
Monitor routers and switches
If a router/switch is visible in the network (i.e. it has an IP address), its status can be monitored by nagios. A sample configuration file (switches.cfg) has already been installed in the objects folder.
The most basic monitoring to a router is a check_ping command. This allows monitoring information such as package loss and uptime. In addition, if the router supports snmp, more advanced monitering such as ports and band width can be achieved. In the following example Nagios is configured to moniter a Linksys Wrt350 router (ecklie) with dd-wrt system.
1. SNMP should be enabled on the router. For ecklie, goto Services->Services and on the SNMP section, click enable. And there will be additional blocks pop out. Change the Name to ecklie(or the name of the router) and click save at the bottom of the page then click apply.
2. On the nagios server, the nagios plugins must configure with net-snmp and net-snmp-utils package to get the check-snmp plugin installed. And also install wrgt package for monitoring the bandwidth.
Configure mrgt:
wrgt (oss.oetiker.ch/mrtg) is a plotter for the the network usage. In Centos 5.1 wrgt can be installed through yum.
The format for mrgt is mrtg <mrtgcfgfile>, wheremartgcfgfile is the configuration file for mrtg. A cfg generation tool called cfgmaker is installed with mrtg and it can be used to generate the config file for mrtg to use.
cfgmaker --global 'WorkDir: /home/nagios/mrtg' --global 'Options[_]: growright,bits' --ifref=ip
public@10.0.11.1
The --ifref parameter is important because it sets the chriteria that differenciates the ports, other values include nr,ip,eth,descr,name,type.
copy the file to /home/nagios/mrtg folder and change the owner to nagios
The mrtg has to be executed continuously, add the following lines to the crontab -e of usr nagios so that mrtg executes every 3 minutes:
*/3 * * * * env LANG=C /usr/bin/mrtg /home/nagios/mrtg.cfg --logging /var/log/mrtg.log
After the above configuration, one mrtg is excuted, logfiles for the bandwiths of defferent ports of the router will be generated in the /home/nagios/mrtg folder and the filename is in the format of <router address>_<port address>.log this can be used in the check_local_mrtgtraf plugin for checking bandwith of different ports.
3. Define the router and services in the switch.cfg file and make nagios to read this file in nagios.cfg. The configuration used in this is the following:
define host{
use generic-switch ; Inherit default values from a template
host_name ecklie ; The name we're giving to this switch
alias Linksys WRT350 Switch ; A longer name associated with the switch
address 10.0.11.1 ; IP address of the switch
hostgroups switches ; Host groups this switch is associated with
}
define hostgroup{
hostgroup_name switches ; The name of the hostgroup
alias Network Switches ; Long name of the group
}
define service{
use generic-service ; Inherit values from a template
host_name ecklie ; The name of the host the service is associated with
service_description PING ; The service description
check_command check_ping!200.0,20%!600.0,60% ; The command used to monitor the service
normal_check_interval 5 ; Check the service every 5 minutes under normal conditions
retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined
}
define service{
use generic-service ; Inherit values from a template
host_name ecklie
service_description Uptime
check_command check_snmp!-C public -o sysUpTime.0
}
define service{
use generic-service ; Inherit values from a template
host_name ecklie
service_description Port 1 Link Status
check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB
}
define service{
use generic-service ; Inherit values from a template
host_name ecklie
service_description Port 1 Bandwidth Usage
check_command check_local_mrtgtraf!/var/lib/mrtg/10.0.11.1_1.log!AVG!1000000,1000000!5000000,5000000!1
Monitor services
Nagios supports monitoring of a wide variety of network services through the use of pugins. This section provides guide for setting up nagios to monitor common services in the lab:
LDAP service
LDAP is implemented in the lab using openldap, which provides central authentication for the machines in the lab. To monitor the status of ldap service, check_ldap plugin is used. The options for check_ldap are:
Options:
-h, --help
Print detailed help screen
-V, --version
Print version information
-H, --hostname=ADDRESS
Host name, IP Address, or unix socket (must be an absolute path)
-p, --port=INTEGER
Port number (default: 389)
-4, --use-ipv4
Use IPv4 connection
-6, --use-ipv6
Use IPv6 connection
-a [--attr]
ldap attribute to search (default: "(objectclass=*)"
-b [--base]
ldap base (eg. ou=my unit, o=my org, c=at
-D [--bind]
ldap bind DN (if required)
-P [--pass]
ldap password (if required)
-T [--starttls]
use starttls mechanism introduced in protocol version 3
-S [--ssl]
use ldaps (ldap v2 ssl method). this also sets the default port to 636
-2 [--ver2]
use ldap protocol version 2
-3 [--ver3]
use ldap protocol version 3
(default protocol version: 2)
-w, --warning=DOUBLE
Response time to result in warning status (seconds)
-c, --critical=DOUBLE
Response time to result in critical status (seconds)
-t, --timeout=INTEGER
Seconds before connection times out (default: 10)
-v, --verbose
Show details for command-line debugging (Nagios may truncate output)
The command defined for checking ldap in the commands.cfg file
define command{
command_name check_ldap
command_line $USER1$/check_ldap -H $HOSTADDRESS$ -3 -4 -T -b $ARG1$ -w $ARG2$ -c $ARG3$
}
And on sophia.cfg add:
define service{
use generic-service
host_name sophia.blueprint.org
service_description check the status of ldap service
check_command check_ldap!dc=blueprint,dc=org!3!10
}
After the configuration, the status of ldap service can be monitored.
Setting up notifications
Automatically notify admin when something goes wrong is a important feature for naigos. The threshold for nagios to generate a notification is usually defined as parameters for the check command. For more information on the detailed parameters for individual check command, refer to the nagios documentation.
The contacts and contact groups are defined in the contacts.cfg file. Contact groups are usful for define the scope of the notification. There are many communication methods can be used to receive nagios notifications, such as email, pager and instant message. The following guide will set up a email server so that nagios can send out notification by email.
By default nagios uses the bsd mail command to send out emails which uses mail servers such as sendmail or postfix. In Centos5.1 the sendmail server is set up by default. It just need to change one line in the /etc/mail/sendmail.mc change the line:
DAEMON_OPTIONS('Port=smtp,Addr=127.0.0.1,Name=MTA')dnl
to:
DAEMON_OPTIONS('Port=smtp,Name=MTA')dnl
After this setting remake the sendmail.cf file by make command in the /etc/mail folder, and restart sendmail service. Now nagios would be able to send out notification to local machines and nus email addresses