Tuesday, May 02, 2006

Lies, Damned Lies and Web Stats

Since it is run by computers, you'd imagine that the internet is a very measurable entity where most things can be calculated and computed. However, the truth is far from this.

Let me explain by using the concept of 'unique users'. If a website has no registration, then a viewer is generally identified in one of two ways. The first is by using a unique address allocated to the user's computer, called the IP address, the second is by using cookies - small files dropped on the user's computer.

So, you then measure everything that this users does. This is fine and works well for one session. However, the IP address can be dynamically allocated by the network the user is on, so will change from session to session. You could also find the same IP being reused by another user. Since this is the norm rather than the exception on both domestic and many corporate networks, the likelihood of identifying the user a second time is very low. Using cookies is somewhat more dependable, but users can frequently flush their cookies, resulting in the unique identifier being lost.

The approximate result is that uniques are almost always heavily overstated by web metrics packages.

Even worse is the accountability over concepts such as page impressions and hits. I have tried web reporting packages side by side and found the same metric gets markedly different results. Invariable a package called WebTrends reports the most inflated figures all round. No surprise then that it is the metrics package of choice. After all, anyone running a web service would like to have more, not fewer users, and damn the truth.

The TV industry has become a master at the obfuscation of reality into statistics. In New York, the biggest single tv advertising marketplace in the world, a few thousand people keeping paper diaries represents the tv viewing habits of millions.

Yes, let me run this past you again. A $15.5 billion market is very approximately measured by a tiny sample of people with pen and paper. Take against this the availability of several hundreds of channels, let alone time shifting and pay-per-view services and the margin for error is , as a statistician friend of mine quipped 'close to +/- 100%'.

At Narrowstep we've worked very hard to develop a system that gives our customers detailed real time data on users, and we can indeed report in detail on a great number of parameters, not least of which are the number of viewer sessions, the length of viewing and the number and percentage in length of the ads viewed.

In doing this, it has become apparent to me how low traditional broadcast audiences must be. It is in the interest of the whole industry - advertisers, media buyers, media sellers and the channels themselves, that the figures reported are artificially high.

But this can't last, the major advertisers are realising that much of their budget is wasted and are looking at more accountable media such as - yes, you guessed it - online ads.

