Using Nagios to monitor webpages

After my last Nagios post I noticed an increase in hits triggered by search terms on the subject of web scraping, and using Nagios to monitor web pages. This post covers several methods of monitoring pages using Nagios, from the basic page check, to the more complicated user interaction.

As Nagios has a very simple plugin architecture, expanding the simple monitoring service from basic checks to full blown user tests is usually a case of increasing the capability of the plugins, or using new ones. Nagios Exchange has a large number of plugins that expand on the basics, providing more features.

check_http

The first on the list is the basic check_http plugin. This plugin comes bundled as part of the standard Nagios Plugins package, and is available as a package for most distributions, for example Debian has it named nagios-plugins-standard. If you’re using a package system, installing these plugins from this package will make your life easier as most of the command definitions will be handled for you. Lets throw some simple examples together:

The following test will make sure the server is alive, and responding with an “ok” HTTP status code. An “ok” HTTP status code is anything beginning with a 2##, or 3##, for example 200, 302, etc.

./check_http -H google.com
HTTP OK: HTTP/1.1 301 Moved Permanently - 530 bytes in 0.093 second response time |time=0.093018s;;;0.000000 size=530B;;;0

This tests the page for a specific string to exist. Requesting google.com returns a 301, which tells us we need to redirect our request, so we’ll tell the plugin we need to follow redirects.

./check_http -H google.com -f follow -s "About Google"
HTTP OK: HTTP/1.1 200 OK - 9315 bytes in 0.187 second response time |time=0.186770s;;;0.000000 size=9315B;;;0

Lets now test if we can get Google to search, and a specific page comes up…

./check_http -H google.com -f follow -u "/search?q=nagios" -s "www.nagios.org"
HTTP OK: HTTP/1.1 200 OK - 45298 bytes in 0.132 second response time |time=0.132258s;;;0.000000 size=45298B;;;0

What happens if we want to login to some website, and check the multiple pages? That’s where check_http stops being useful, and we need to upgrade to the next step. We could write our own scripts using cURL, but that gets pretty complicated fairly quickly.

cURL

cURL, as desribed by the site, is a command line tool for transferring data with URL syntax. It’s not really a good tool for complex monitoring, but can be used for quick scripts, or a couple of pages. It can’t do the testing on its own, so we’ll need to tie in other utilities like grep. Below is a quick bash script that does the same as the check_http test above. As a side note, Google doesn’t like you scraping their site with curl, you get a 403, but I’ll continue to use it as an example, assuming Google did respond¹.

#!/bin/sh
URL="http://www.google.com/search?q=nagios"
TMPFILE=`mktemp /tmp/google_watch.XXXXXX`
curl -s -o ${TMPFILE} ${URL} 2>/dev/null
if [ "$?" -ne "0" ];
then
	echo "Unable to connect to ${URL}"
	exit 2
fi
RES=`grep -i "www.nagios.org" ${TMPFILE}`
if [ "$?" -ne "0" ];
then
	echo "String www.nagios.org not found in ${URL}"
	exit 1
fi
echo "String found"
exit 0;

This is a lot of code just to check a simple search string. If you try expanding that to include multiple pages, cookie handling, advanced form data, and you can see how this will quickly become unmaintainable. This is where better tools come into play. Next on the roster is Webinject…

Webinject

Webinject is a Perl based website/http testing tool. While it probably wasn’t originally designed with Nagios in mind, there is a reporting option built into Webinject that will output Nagios plugin compatible output. There are two versions of webinject floating around. The first version is over here. The second is over here, and is an updated version. For the most part, they behave exactly the same, the later having additional bug fixes in it, and a slightly different way to execute. Without going into the differences in execution, or bug fixes, lets create a couple of test cases. They will all use the same base configuration:

<globalhttplog>yes</globalhttplog>
<baseurl>http://www.google.com</baseurl>
<reporttype>nagios</reporttype>

Duplicating the above, lets do a basic page load check…

<testcases repeat="0">
<case id="1"
  description="Open Google"
  method="get"
  url="{BASEURL}"
  verifypositive="About Google"
  errormessage="Unable to open Google" />
</testcases>

When executing this we get the following:

./webinject.pl -c test.xml test_case.xml 
WebInject OK - All tests passed successfully in 0.196 seconds|time=0.196;0;0;0;0 case1=0.114;0;0;0;0

Okay, results good, and we get some nice performance data too, test execution was 0.196ms, and 0.114 of that was in case1, with some overhead elsewhere. Now lets add some extra work, and load the main page, then submit a search request as we did before. We’ll add the following test to the above test case:

<case id="2"
  description="Search Nagios"
  method="get"
  url="{BASEURL}/search?q=nagios"
  verifypositive="www.nagios.org"
  errormessage="Unable to search on Google" />

And again, the results look good…

./webinject.tmp -c test.xml test_case.xml 
WebInject OK - All tests passed successfully in 0.361 seconds|time=0.361;0;0;0;0 case1=0.097;0;0;0;0 case2=0.102;0;0;0;0

Now if you have the logging options enabled, and you take a look at the http.log, you’ll notice that webinject has kept the cookies as it moved from case 1 to case 2, this is good because it allows us to move around a site with session cookies without having to do much work.

Nagios-Cucumber

nagios-cucumber is a ruby based, natural language processor. It uses webrat and mechanize to do all the heavy lifting, while keeping the front end of it as easy as possible. For example, using the same example as our web-inject script above, we’ll use cucumber to execute it.

Feature: google.com
  It should be up
  And I should be able to search for things

  Scenario: Searching for things
    When I go to "http://www.google.com/"
    And I fill in "q" with "nagios"
    And I press "Google Search"
    Then I should see "www.nagios.com"

Once a feature/scenario is setup, execution is simple…

$ cucumber-nagios features/google.com/search.feature
Critical: 0, Warning: 0, 4 okay | value=4.0000;;;;

The good thing about nagios-cucumber is the natural language used to configure it, making it ideal for other people to write the conditions for the site to be monitored (like the developers or QA team that wrote/tested the site).

Selenium

Selenium goes above the rest, and actually does testing using a real browser. Selenium is a suite of tools to automate web application testing, supporting multiple browsers and platforms. It doesn’t directly support Nagios, but there is a plugin on the Nagios Exchange site that interacts with Selenium itself. I’ve not yet played with Selenium², but the check_selenium author wrote a pretty good document on the plugin page detailing how to setup Nagios and Selenium checks.

One thing to note with Selenium is that it does launch a browser to do the checks. Too many checks may overload a host, requiring you to look at Selenium Grid to distribute the checks across multiple machines.

Summary

Because Nagios is so extensible, and the only requirement to get information back to it is a series of exit codes, making web checks is as simple as a single command, or complex end user browser testing. There are lots of alternatives, and have different requirements, and fit different environments.

Do you have an alternative method to monitoring websites using Nagios? What do you think of the above?

There are ways around this restriction, but I’m not going to educate people on how to do this. ↩
It’s on my list of things to look at. ↩

TheGeekery

The Usual Tech Ramblings