March 5, 2012 ↘︎

The Google sampling effect

We’re just working with a client that uses Google Analytics for their measurement platform and as part of our overall strategic plan, we’ve done some supporting analysis for them, reviewing the last year with respect to a specific segment or two.

They get a surprisingly large amount of traffic for a WA business, which is especially significant when it comes to using Google Analytics.  On average, they get about 25 million page views per month – so, over the course of the year, that’s somewhere around 300 million page views.

During our supporting analysis, Google announced a new feature – controlling the sampling rate.  According to their blog post:

Control your report calculationgoogle_slider
One way we speed up the serving of data is through what we call “fast-access mode”, which applies to reports generated from large data sets. In the coming weeks, we will be peeling back the curtain on how “fast-access mode” works and letting you control the number of visits used to calculate reports.

Out with the old: fast-access mode
If a report requires calculation on more than 250,000 visits, we select a statistically random sample of 250,000 visits and estimate the report results based on that data. This makes reports faster to load, and our testing indicates that the data returned is highly accurate.

In with the new: control your report calculation
Now you will have the ability to control the number of visits used to calculate your reports, and we inform you of exactly how many visits are used in report calculation.

Out of interest on this new feature, I decided to play around with the settings and look at differences in the reports and conversions, using different sampling rates, and the ultimate reason for the improvement being speed – and I was quite surprised by the results.

Using their slider, I changed the sampling effect from the default, which is in the middle, moving it towards the right, or Higher Precision.  Each time, I switched over to the conversion reports to see what the results were.

On the default setting, the Paid Non Branded search conversion rate was 10.32%, but at the extreme right, it was 4.95%.  Interestingly, while Paid Search conversion deteriorated, Organic Non Branded improved.

Likewise, the other conversion rates all “tightened up” as I progressed from the default to the higher precision.google_sampling_effect

Notice the size of the sample set though – from its default it’s using 2.6% of visits; at higher precision it’s still only using 5.15% of visits (500,000 our of 9.7 million) – and it was this that caused me some concern.

When I pushed it all the way to left, Faster Processing, the results were useless.  It was based on <0.01% of traffic; there were no conversions for Organic Non Branded, or Paid (both Org and NB).  So, while super quick, super inaccurate.

What are the real rates?

If we’re seeing such a variance in conversion rates from 2.6% to 5.15% of overall traffic, then I wonder what the actual conversion rates are, without sampling applied?  Unfortunately, we have no way of knowing, as far as I can tell.

While sampling certainly helps Google with returning results quickly, I think they should allow the opportunity to see what “actual” is, so that at the very least you can determine your margin of error across any report.  There’s absolutely no point in having really quick reports, when the results are misleading.

In the above conversion report, Paid Non Branded converts at 10% using the default setting, whereas in fact it’s less than half of that rate.

I would certainly recommend that if you are using Google Analytics, you look at the impact being applied to your reports through this sampling effect, especially if you are optimising conversions.

DB logo
DB logo
DB logo