In addition to the HTTP status code, URL monitors must capture the request latency: a site may be responding, but slowly enough to impact customers. The following code outlines a URL monitor in Perl, using LWP::UserAgent to request the URL and Time::HiRes to measure latency:
#!/usr/bin/perl -w
use strict;
my $url = shift || die "Usage: $0 url\n";
use LWP::UserAgent ();
use Time::HiRes qw(gettimeofday tv_interval);
my %output;
my $ua = LWP::UserAgent->new;
my $start = [gettimeofday];
my $response = $ua->get($url);
$output{latency} = sprintf '%.2f', tv_interval($start);
$output{status} = $response->code;
# TODO template %output as demanded by
# the monitoring system
for my $key ( keys %output ) {
print "$key=$output{$key}\n";
}
Metrics & Alerts
Graph the HTTP status code along with the latency. This avoids the information loss produced by mapping the codes into arbitrary “good” or “bad” values. The alerting framework should handle translation of the code into an alert, as appropriate for the URL: >= 400 pages someone, while >= 300 only warns about the unexpected redirection response.
The latency graph provides clues into the problem: assuming a 200 HTTP status code in each case, a ledge at N seconds indicates a timeout of some sort (check for DNS problems), while a scattered graph points to network packet loss or similar load induced problem. Watch the average latency over time, then set an appropriate alert threshold, perhaps five seconds. Another option: alert should a “major” increase occur in the latency, perhaps one or two standard deviations above a moving average. This will catch sudden latency increases, but will fail to alert in the event latency slowly rises beyond the Service Level Agreement (SLA) threshold.


If possible, negotiate the SLA in advance, to ensure a proper solution can be developed. Also consider how far below the SLA alerts must be set, to allow triage before a system breaks SLA.
Check Multiple URL
A single script can check multiple related URL as plugin for Nagios or comparable monitoring systems. Do not number the URL arguments in the output; instead, associate short aliases for each URL monitored. Numbered URL force the question “well, what URL is actually in error?” while aliases provide a hint while not exceeding any length limits on monitoring output. If the monitoring system allows, include the full URL in the output, so a reader can copy or click on the URL directly from the alarm message.
#!/usr/bin/perl -w
use strict;
die "Usage: $0 alias.url [.. alias.url]\n"
unless @ARGV;
use LWP::UserAgent ();
use Time::HiRes qw(gettimeofday tv_interval);
my @results;
my $ua = LWP::UserAgent->new;
for my $target (@ARGV) {
my %output;
( $output{alias}, $output{url} ) = split /\./,
$target, 2;
my $start = [gettimeofday];
my $response = $ua->get( $output{url} );
$output{latency} = sprintf '%.2f', tv_interval($start);
$output{status} = $response->code;
push @results, \%output;
}
for my $result (@results) {
print join( ':',
map { $result->{$_} }
qw(alias status latency) ),
"\n";
}
Consider Deeper Content Checks
If necessary, also setup content checks that ensure websites contain the expected content. A site may return 200 status codes within the SLA, but contain no data if a software bug or caching problem omits some or all of the page content.
Technorati Tags: Perl