August 31, 2009 ↘︎

So you dare to compare…

If you’re thinking of trying to compare web analytics results by vendor, or even by the original log file, you’re in a for a very tough time.  Just thinking about it is a mistake.

Genie + bottle + uncork = Long time, not good time

The problem is that while there are basically two different methods of data collection (via server logfiles or via JavaScript tags), the variables associated with both, in a real world environment, make it almost impossible to compare results.

And therein lies your genie out of bottle, and you scrabbling around trying to justify the results.  You’re best off not even trying.

Server Logfiles

Every web server stores requests made of it to a log file (providing logging is turned on).  Programs like AWStats and many others out there, read through all of the entries, sorting out all of the requests into general reports, such as page views, visitors, visits etc.  They can also generally weed out spiders from search engines, so you get a little closer to unique “human” counts.

But there still remains a big problem with this method and that is “caching”.  If a person re-visits a page, quite often the second and subsequent request will actually be served from the browser memory, or cache, and no request will go back to the web server for logging in the file.  Secondly, with the advent of network caching servers, the same thing occurs.  So, you end up with under-counts.

JavaScript tagging

Enter JS tagging.  Small bits of JavaScript are added to the page, typically within the body section.  After the page loads, the JS executes a request to the vendor logging server, passing information to it, such as page name, time and date, information about the visitor etc.  Nowadays, a cookie is also generally set, to spot repeat visitors.

The benefit of this method is that it typically defeats caching as each time the code is executed a new request is made to the vendor.  The other benefit is that other information can also be set and sent, such as custom events, campaign codes etc, which is very important to enhance the overall and customization capability.

The downside is that they generally require the user to have JavaScript enabled (which most do) and, the accuracy is determined by the location of the code and if the code executes in a timely manner, before the user has clicked to another page.

Genie, bottle, out

Imagine trying to compare fuel consumption on two similar vehicles over an identical distance.  While the vehicles might be identical, the way the vehicle is driven, the amount of traffic, the number of times you stop at lights, the temperature and humidity etc, all affect fuel efficiency.

You get a similar challenge when comparing vendor results, the variable in this case though are the way the user interacts with your site.  And unfortunately, the disparity in results often leads to dissatisfaction with the vendors solution, rather than an understanding of how the differences can occur.

Let’s say you use both Google Analytics and Omniture and Web Server Log Files.  If you browse to a page and let it fully load, you’ll no doubt have the same counts across all three.  1 visit, 1 page view.  The log file will also show all of the other associated requests, such as CSS, JS, images and the likes.  But this is not a real world comparison.

The problems start to creep in when users start to browse your pages.

Some will click links before the page has completely loaded (JS will not record activity).  Some may have JS turned off (JS will not record activity).  Some may be with an ISP that utilizes caching servers (log files won’t record activity).  If Google JS is at the top of the page and Omniture is at the bottom of the page, Google may record a page view, but if the user doesn’t let the page fully load, Omniture possibly  won’t record the page view.  First party and third party cookies are based on user settings and may not be set, affecting visitor counts.  And the differences continue.

But over the years, vendors have strived to get as close to the truth as possible, using very sophisticated JavaScript – and they have to; it’s a multi-million, if not billion, dollar industry, which continues to grow as site owners demand more flexibility and insight into user behaviour.

If you are looking for a solution where you can customize your insight, then the JS tagging option is your best bet, as it provides more flexibility.  You can track clicks that wouldn’t be recorded by a server log file; you can track flash interactions; you can track campaign activity, you can track shopping cart volumes and revenue and products.  And, with certain providers such as Omniture, you can target content based on the user’s previous activity across your site, using products like Omniture Test and Target (saving that for another post).

So, the thing to do is to try not let that genie out of the bottle, otherwise you’ll spend the rest of your time trying to put it back in.

At best, you should expect (and explain to your stakeholders) that differences will occur and provide a rationale for the differences.  As long as you’ve implemented your tagging correctly, you should be pretty close to the truth.

And remember, web analytics is not about the absolute numbers…

DB logo
DB logo
DB logo