Monitor remote linux machines
Sometimes, the status such as disk usage of remote machines needs to be monitored. There are two ways to do that - by SSH connection of the nagios server, or by installing NRPE daemon on the machine that needs monitoring. The following demonstrate how to use NRPE to monitor the status of a remote machine.
Configure remote machine
In the first step, the nagios plugins must be installed on the remote host so that all the checks commands will working on the remote host.
useradd nagios
passwd nagios
./configure
make
make install
chown nagios:nagios /usr/local/nagios
chown nagios:nagios /usr/local/nagios/libexec
Next is to install xinetd using yum.
Next install nrpe by the following command:
tar xzf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
make install-plugin
make install-daemon
make install-daemon-config
make install-xinetd
Edit the /etc/xinetd.d/nrpe file and add the IP address of the monitoring server to the only_from directive.
only_from = 10.0.11.5
Add the following entry for the NRPE daemon to the /etc/services file.
nrpe 5666/tcp #NRPE
Now restart xinetd service
service xinetd restart
If everything worked a netstat -st | grep nrpe should output the following:
tcp 0 0 *:nrpe *:* LISTEN
Now open up the firewall for nrpe service
iptables -I INPUT -p tcp -m --dport 5666 -j ACCEPT
or, if using cetos/Redhat/fedora
iptables -I RH-Firewall-1-INPUT -p tcp -m tcp -dport 5666 -j ACCEPT
Note: After setting up the only_from directive for xinetd, the local nrpe plugin would not be able to inspect the setting of local files.
For systems with out compiling environment, we need to build rpm packages from systems that has one, a simple method is to use the rpmbuild utility.
Put the nagios-plugins and nrpe source pakcage in one folder and use the following commands to build respective rpm packages:
rpmbuild -tb nrpe*
rpmbuild -tb nagios-plugins*
Building of nagios-plugins rpm package need perl-Net-SNMP package which could be found in dag repo. By default, the rpm pacakge built can be found in /usr/src/redhat/RPMS/i386 folder if the rpm package is compiled for i386 systems, for 64bit systems, the folder is x86_64.
After building the package, nrpe can be installed by rpm -i command. By default the configuration file would be located at /etc/nagios folder instead of /usr/local/nagios/etc.
Configure the nagios server
On the nagios server the nrpe plugin needs to be installed.
tar xzf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make all
make install-plugin
Now we can check if NRPE is working by
/usr/local/nagios/libexec/echeck_nrpe -H sophia
If everything works, the nrpe version would pop out.
Next step is to define the nrpe command in the object file commands.cfg
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
Sample commands that can be used by NRPE are
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
For security reasons the nrpe should not be configured to allow command arguments unless absolute nessessay.
The following sample services were added in the testing to monitor vavious aspects of the remote machine sophia.
define service{
use generic-service
host_name sophia.blueprint.org
service_description SSH
check_command check_ssh
}
define service{
use generic-service
host_name sophia.blueprint.org
service_description CPU Load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name sophia.blueprint.org
service_description Current Users
check_command check_nrpe!check_users
}
define service{
use generic-service
host_name sophia.blueprint.org
service_description /dev/hda1 Free Space
check_command check_nrpe!check_hda1
}
define service{
use generic-service
host_name sophia.blueprint.org
service_description Total Processes
check_command check_nrpe!check_total_procs
}
define service{
use generic-service
host_name sophia.blueprint.org
service_description Zombie Processes
check_command check_nrpe!check_zombie_procs
}
Monitor remote windows machines
Windows machines can be monitored through nagios by installing a monitoring daemon NSclient++.
Install NSclient++ on Windows machine
nsclient /install
Open the services manager and make sure the NSClientpp service is allowed to interact with the desktop (see the ’Log On’ tab of the services manager). If it isn’t already allowed to interact with the desktop, check the box to allow it to.
Edit the NSC.INI file (located in the C:\NSClient++ directory) and make the following changes:
- Uncomment all the modules listed in the [modules] section, except for CheckWMI.dll and RemoteConfiguration.dll
- Optionally require a password for clients by changing the ’password’ option in the [Settings] section.
- Uncomment the ’allowed_hosts’ option in the [Settings] section. Add the IP address of the Nagios
- server to this line, or leave it blank to allow all hosts to connect.
- Make sure the ’port’ option in the [NSClient] section is uncommented and set to ’12489’ (the default port).
Now starts the nsclient service by:
nsclient++ /start
To configure the nagios server, just include windows.cfg in the nagios configuration and make changes to windows.cfg using the template.
Monitor routers and switches
If a router/switch is visible in the network (i.e. it has an IP address), its status can be monitored by nagios. A sample configuration file (switches.cfg) has already been installed in the objects folder.
The most basic monitoring to a router is a check_ping command. This allows monitoring information such as package loss and uptime. In addition, if the router supports snmp, more advanced monitering such as ports and band width can be achieved. In the following example Nagios is configured to moniter a Linksys Wrt350 router (ecklie) with dd-wrt system.
1. SNMP should be enabled on the router. For ecklie, goto Services->Services and on the SNMP section, click enable. And there will be additional blocks pop out. Change the Name to ecklie(or the name of the router) and click save at the bottom of the page then click apply.
2. On the nagios server, the nagios plugins must configure with net-snmp and net-snmp-utils package to get the check-snmp plugin installed. And also install wrgt package for monitoring the bandwidth.
Configure mrgt:
wrgt (oss.oetiker.ch/mrtg) is a plotter for the the network usage. In Centos 5.1 wrgt can be installed through yum.
The format for mrgt is mrtg <mrtgcfgfile>, wheremartgcfgfile is the configuration file for mrtg. A cfg generation tool called cfgmaker is installed with mrtg and it can be used to generate the config file for mrtg to use.
cfgmaker --global 'WorkDir: /home/nagios/mrtg' --global 'Options[_]: growright,bits' --ifref=ip
public@10.0.11.1
The --ifref parameter is important because it sets the chriteria that differenciates the ports, other values include nr,ip,eth,descr,name,type.
copy the file to /home/nagios/mrtg folder and change the owner to nagios
The mrtg has to be executed continuously, add the following lines to the crontab of usr nagios so that mrtg executes every 1 minutes:
crontab -u nagios -e
Then add the following line in the editor opened:
*/1 * * * * env LANG=C /usr/bin/mrtg /home/nagios/mrtg.cfg --logging /var/log/mrtg.log
After the above configuration, one mrtg is excuted, logfiles for the bandwiths of defferent ports of the router will be generated in the /home/nagios/mrtg folder and the filename is in the format of <router address>_<port address>.log this can be used in the check_local_mrtgtraf plugin for checking bandwith of different ports.
3. Define the router and services in the switch.cfg file and make nagios to read this file in nagios.cfg. The configuration used in this is the following:
define host{
use generic-switch ; Inherit default values from a template
host_name ecklie ; The name we're giving to this switch
alias Linksys WRT350 Switch ; A longer name associated with the switch
address 10.0.11.1 ; IP address of the switch
hostgroups switches ; Host groups this switch is associated with
}
define hostgroup{
hostgroup_name switches ; The name of the hostgroup
alias Network Switches ; Long name of the group
}
define service{
use generic-service ; Inherit values from a template
host_name ecklie ; The name of the host the service is associated with
service_description PING ; The service description
check_command check_ping!200.0,20%!600.0,60% ; The command used to monitor the service
normal_check_interval 5 ; Check the service every 5 minutes under normal conditions
retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined
}
define service{
use generic-service ; Inherit values from a template
host_name ecklie
service_description Uptime
check_command check_snmp!-C public -o sysUpTime.0
}
define service{
use generic-service ; Inherit values from a template
host_name ecklie
service_description Port 1 Link Status
check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB
}
define service{
use generic-service ; Inherit values from a template
host_name ecklie
service_description Port 1 Bandwidth Usage
check_command check_local_mrtgtraf!/var/lib/mrtg/10.0.11.1_1.log!AVG!1000000,1000000!5000000,5000000!1