How to monitor BGP sessions with Nagios

I had originally written this tutorial for xmodulo.com
As a versatile network monitoring tool, Nagios can monitor the health of various network elements such as routers and switches. Among others, Nagios can be configured to monitor BGP sessions of a router, and generate alerts whenever there is something wrong in existing BGP sessions.
Several BGP monitoring plugin tools exist for Nagios. These tools are useful in monitoring critical BGP peers of a router, and generating alerts for them. This tutorial will explain how to monitor BGP neighbors using Nagios plugin tools.

Tested Platforms

  • Tested and working perfectly with Cisco ASR 1000 series routers.
  • Tested and working perfectly with Cisco 3700 series layer-3 switches.
  • Should work with other Cisco devices as well.
  • Tested and does NOT work with Quagga routers.
  • Not tested on JunOS.

Tested Topology

Nagios will monitor all the BGP sessions for router-1. BGP neighbors AS 50 and AS 70 are vital for operation, so a separate script will be used to monitor these specific neighbors and generate alerts in case they become unreachable.

Create a Host for Router-1 in Nagios

Before we start, we will define a host for router-1 as follows.
# vim /etc/nagios3/conf.d/bgp-host.cfg
define host{
        use                     generic-host            
        host_name               router-1
        alias                   router-1
        address                 100.100.100.100
        }

Monitor All BGP Sessions of Router-1

Nagios comes with a built-in tool called check_bgpstate to check the BGP state of a router using SNMP. This tool can be manually run to check router-1 by providing necessary parameters.
# /usr/lib/nagios/plugins/check_bgpstate -c myCommunity 100.100.100.100
OK: host '100.100.100.100', sessions up: 4, down: 0, shutdown: 0
Now, we will define a command in Nagios to start monitoring all the BGP sessions of router-1 using this tool.
# vim /etc/nagios-plugins/config/bgp.cfg
define command{
        command_name    check_all_bgp
        command_line    /usr/lib/nagios/plugins/check_bgpstate '$HOSTADDRESS$' -C '$ARG1$'
}
Now that the command is ready, we will define a new service which applies the command to router-1.
# vim /etc/nagios3/conf.d/bgp-service.cfg
define service {
        host_name  router-1
        service_description  All BGP session summary
        check_command  check_all_bgp!myCommunity
        check_interval  1
        use   generic-service
        notification_interval 0 ; set > 0 if you want to be re-notified
}
If additional routers need to be added, all we need to do is define hosts for router-2, router-3 and so on, and apply the command check_all_bgp to each of them just like router-1.
Before restarting Nagios service, verify that all the parameters are configured properly by doing a dry run as follows.
# nagios3 –v /etc/nagios3/nagios.cfg
If everything looks good, we restart Nagios service.
# service nagios3 restart
If all goes well, Nagios should be checking the summary of all the BGP sessions, and this can be verified at Nagios portal i.e., http://serverIP/nagios3 > Services

Monitor Particular BGP Neighbor(s) of Router-1

The previous check_bgpstate tool provides alerts only for BGP session summaries i.e., 3 up, 1 down. When any broken BGP session is detected, one needs to login to the router manually, and check which BGP session is down.
For critical BGP neighbors (like AS 50 and AS 70 mentioned in the topology diagram), you may want to monitor individual BGP neighbors. To enable Nagios to monitor and alert for particular BGP neighbors (e.g., AS 50 and AS 70), a separate script called check_bgp is used.

Prepare check_bgp Script

The check_bgp script (latest version 0.4) can be downloaded from the official site. After downloading the script, it is copied and renamed to Nagios plugin directory. Necessary permissions are set to the script as well.
# cp check_bgp.0.4.pl /usr/lib/nagios/plugins/check_bgp.pl
# chmod +x /usr/lib/nagios/plugins/check_bgp.pl
Now that the script is ready, we first do a test run to check the output of the script.
# cd /usr/lib/nagios/plugins/
# ./check_bgp.pl -H 100.100.100.100 -C myCommunity -p 50.50.50.50
OK – 50.50.50.50 (AS50) state is established(6). Established for Duration.
Now that the script is ready to be used, we go ahead and configure Nagios next.

Preparing Nagios

First we define a command in Nagios, which will check a particular BGP peer.
# vim /etc/nagios-plugins/config/
define command{
        command_name    check_bgp
        command_line    /usr/lib/nagios/plugins/check_bgp.pl -H '$HOSTADDRESS$' -C '$ARG1$' -p '$ARG2$'
}
Now that the command is set up, we define two new services which apply the command to AS 50 and AS 70, respectively.
# vim /etc/nagios3/conf.d/bgp-service.cfg
define service {
        host_name                router-1
        service_description      BGP check with AS 50
        check_command            check_bgp!myCommunity!50.50.50.50
        check_interval           1
        use                      generic-service
        notification_interval    0 ; set > 0 if you want to be re-notified
}

define service {
        host_name                router-1
        service_description      BGP check with AS 70
        check_command            check_bgp!myCommunity!70.70.70.70
        check_interval           1
        use                      generic-service
        notification_interval    0 ; set > 0 if you want to be re-notified
}
Before restarting Nagios service, verify that all the parameters are configured properly by doing a dry run as follows.
# nagios3 –v /etc/nagios3/nagios.cfg
If everything looks good, we restart Nagios service.
# service nagios3 restart
Nagios should start monitoring two BGP neighbors AS 50 and AS 70 in router-1. The status of the two BGP peers can be checked in Nagios portal, i.e., http://serverIP/nagios3 > Services. Nagios will alert if something suspicious is found in either BGP session.
If all goes well, the following screenshot should be visible in Nagios web portal.
Hope this helps.

Comments