Nagios is hands-down the best monitoring tool to monitor host and network equipments. Using Nagios plugins you can monitor pretty much monitor anything.
I use Nagios intensively and it gives me peace of mind knowing that I will get an alert on my phone, when there is a problem. More than that, if warning levels are setup properly, Nagios will proactively alert you before a problem becomes critical.
In this article, I’ll explain how to configure Nagios to monitor network switch and it’s active ports. Uncomment the switch.cfg line in /usr/local/nagios/etc/nagios.cfg as shown below. Add the following switches hostgroup to the /usr/local/nagios/etc/objects/switch.cfg file. In this example, I’ve defined a host to monitor the core switch in the /usr/local/nagios/etc/objects/switch.cfg file. Change the address directive to your switch ip-address accordingly. Displaying the uptime of the switch and verifying whether switch is alive are common services for all switches. So, define these services under the switches hostgroup_name as shown below.
check_local_mrtgtraf uses the Multil Router Traffic Grapher – MRTG. So, you need to install MRTG for this to work properly. The *.log file mentioned below should point to the MRTG log file on your system. Use check_snmp to monitor the specific port as shown below. The following two services monitors port#1 and port#5. To add additional ports, change the value ifOperStatus.n accordingly. i.e n defines the port#.
Sometimes you may need to monitor the status of multiple ports combined together. i.e Nagios should send you an alert, even if one of the port is down. In this case, define the following service to monitor multiple ports. Verify the nagios configuration to make sure there are no warnings and errors. Restart the nagios server to start monitoring the VPN device. Verify the status of the switch from the Nagios web UI: http://{nagios-server}/nagios as shown below: Issue1: Nagios GUI displays “check_mrtgtraf: Unable to open MRTG log file” error message for the Port bandwidth usage Solution1: make sure the *.log file defined in the check_local_mrtgtraf service is pointing to the correct location. Solution2: Make sure both net-snmp and net-snmp-util packages are installed. In my case, I was missing the net-snmp-utils package and installing it resolved this issue as shown below.1. Enable switch.cfg in nagios.cfg
[nagios-server]# grep switch.cfg /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/switch.cfg2. Add new hostgroup for switches in switch.cfg
define hostgroup{
hostgroup_name switches
alias Network Switches
}3. Add a new host for the switch to be monitered
define host{
use generic-switch
host_name core-switch
alias Cisco Core Switch
address 192.168.1.50
hostgroups switches
}4. Add common services for all switches
# Service definition to ping the switch using check_ping
define service{
use generic-service
hostgroup_name switches
service_description PING
check_command check_ping!200.0,20%!600.0,60%
normal_check_interval 5
retry_check_interval 1
}
# Service definition to monitor switch uptime using check_snmp
define service{
use generic-service
hostgroup_name switches
service_description Uptime
check_command check_snmp!-C public -o sysUpTime.0
}5. Add service to monitor port bandwidth usage
define service{
use generic-service
host_name core-switch
service_description Port 1 Bandwidth Usage
check_command check_local_mrtgtraf!/var/lib/mrtg/192.168.1.11_1.log!AVG!1000000,2000000!5000000,5000000!10
}6. Add service to monitor an active switch port
# Monitor status of port number 1 on the Cisco core switch
define service{
use generic-service
host_name core-switch
service_description Port 1 Link Status
check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB
}
# Monitor status of port number 5 on the Cisco core switch
define service{
use generic-service
host_name core-switch
service_description Port 5 Link Status
check_command check_snmp!-C public -o ifOperStatus.5 -r 1 -m RFC1213-MIB
}7. Add services to monitor multiple switch ports together
# Monitor ports 1 - 6 on the Cisco core switch.
define service{
use generic-service
host_name core-switch
service_description Ports 1-6 Link Status
check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB, -o ifOperStatus.2 -r 1 -m RFC1213-MIB, -o ifOperStatus.3 -r 1 -m RFC1213-MIB, -o ifOperStatus.4 -r 1 -m RFC1213-MIB, -o ifOperStatus.5 -r 1 -m RFC1213-MIB, -o ifOperStatus.6 -r 1 -m RFC1213-MIB
}8. Validate configuration and restart nagios
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check# /etc/rc.d/init.d/nagios stop
Stopping nagios: .done.
# /etc/rc.d/init.d/nagios start
Starting nagios: done.
9. Troubleshooting
Issue2: Nagios UI displays “Return code of 127 is out of bounds – plugin may be missing” error message for Port Link Status.[nagios-server]# rpm -qa | grep net-snmp
net-snmp-libs-5.1.2-11.el4_6.11.2
net-snmp-5.1.2-11.el4_6.11.2
[nagios-server]# rpm -ivh net-snmp-utils-5.1.2-11.EL4.10.i386.rpm
Preparing... ########################################### [100%]
1:net-snmp-utils ########################################### [100%]
[nagios-server]# rpm -qa | grep net-snmp
net-snmp-libs-5.1.2-11.el4_6.11.2
net-snmp-5.1.2-11.el4_6.11.2
net-snmp-utils-5.1.2-11.EL4.10