F5, iRules, and content injection

Recently I’ve been working with one of our business units on content tracking. They’ve been trying to track how our site is used, and how popular certain features are. They had started rolling out an appliance that literally sniffed the traffic, and tracked the results. This is okay to a point, but leaves a lot of hard work tracking how the users are using the system. This is where Google Urchin comes in…

Urchin is like Google Analytics on speed, giving you more reports, better feature set, and just generally all around better (not that Google Analytics is anything to sneeze at). It works in the same way, a small bit of javascript added to the page content, and a call to a function, and the urchin stuff does its work. It goes a little further by calling a small transparent gif with some encoded arguments. This allows Urchin to find how people are navigating around the site.

So how does this fit in with the F5? Originally the business unit was pushing to have the development team write the javascript code. It was initially assumed that all the pages on the site shared a common header and footer. After doing some preliminary testing, only a few sections of the site used a common framework, causing the project to go from a 2-3 day project (including testing) to a 6+ week project. This obviously doesn’t make business units happy as it means other projects have to be bumped. This is how the F5 fits in…

A few weeks before all this came up, I was reading up on DevCentral, an F5 community driven site, and stumbled across a post called “Automated Gomez Performance Monitoring”. It sat in my brain as an idea I’d like to try out, maybe deploy Google Analytics on the production site for some testing. It wasn’t too long before it was needed.

So this is what we ended up with…

when HTTP_REQUEST {
 if { [HTTP::version] eq "1.1" } {
  if { [HTTP::header is_keepalive] } {
   HTTP::header replace "Connection" "Keep-Alive";
  }
 }
}


when HTTP_RESPONSE {
 STREAM::disable;
  if { ([HTTP::header Content-Type] starts_with "text/html") && 
([HTTP::status] == 200) } {
    set urc {<script type="text/javascript" src="/urchin.js"></script>
      <script type="text/javascript">
        urchinTracker();
      </script>}
    set stream_expression "@@$urc@"
    STREAM::expression $stream_expression;
    STREAM::enable;
  }
}   
when STREAM_MATCHED {
 STREAM::disable;
}

So what is this? And what is it doing? This is an iRule, a cut down TCL rule processing language on the F5 load balancers. There are three triggers, or events, that are applied here. The first is when an HTTP request is made (the initial client request), the second is when the server sends back HTTP data, and the last is when a filter in the second is matched. The important stuff is in the HTTP_RESPONSE section. Because we want it to only apply to successful html pages¹, we then set a variable with the Urchin code we need to use. Now for the important bit, the STREAM::expression code. This is basically a regular expression, and in my case, I’m looking for the word , as this appears at the end of the page, and replaces that with the Urchin code, and a new tag. The STREAM_MATCHED code kicks in when the processor manages to get a match, and disables the stream engine. This is so that we only do one replacement, just in case we have multiple tags in the content.

This is all great, but there are some caveats. The stream searches will not work on compressed content. It looks like the author of the Gomez injection rules saw this as in the last edition here, he explicitly removes the header from the request telling the server side it supports compression (Accept-Encoding). This seems to impact data going back through the load balancer as well, and stops the F5 compressing the content using profiles. We handled this by disabling compression support on the servers (in our case IIS).

The second caveat, the page should be relatively well formed HTML. Having some bad HTML isn’t too bad, but missing out tags entirely, like the one we’re looking for, will obviously cause this to fail. During testing, we found several pages that had no HTML, or BODY tags at all, so these were sent to the development team as bugs. Another issue we had was a badly formatted page, which had javascript code after the <body> tags. This seemed to impact just some browsers, and made the javascript fail.

The last caveat I can remember, you must have a stream profile enabled on your virtual server. You won’t be able to apply this iRule to your virtual server without it, even if it is the generic stream profile.

So in the end, we created a new HTTP profile which did content compression. This was required as we’d removed it from the server side. The new HTTP profile was assigned to the F5, a stream profile was assigned, and so was the iRule. This quickly started dumping data to the Urchin server when we finished the rollout.

Now the business unit is happy, as we turned what could potentially be a 6 week project into a 15 minute fix up, calling into play some of the more power parts of the F5 load balancers which we have yet to use in this part of the application.

It’s worth reading the entire DevCentral Gomez injection series, Joe Pruitt does an excellent job explaining the rules, how it works, then expanding upon the basic project to track more detailed information.

you don’t want javascript being injected into CSS files for example ↩

TheGeekery

The Usual Tech Ramblings

F5, iRules, and content injection

Comments