To answer this question, you first of all have to answer something more fundamental, namely, what is 'Big Data'. It's a term that's loosely banded about. After all, you could argue that the likes of CERN scientists and the marketeers at Dunhumby have been dealing with big data for decades.
Why has it all of a sudden become a buzz phrase ?
In part it's due to accessibility, the advent of scalable cloud computing means that you no longer have to buy your own supercomputer, but can lease it from the likes of Amazon, Microsoft or Rackspace.
Let me give a couple of example of how we're handling big data at TV Everywhere.
First of all we have written a batch management processor for the Microsoft Azure cloud. This enables us to reverse engineer the cloud so that we can identify the optimal configuration for any cloud setup depending on factors such as time and cost. Initially we're using this to run our new encoding farm, Vidcoding. We can set the system to fastest and have a whole back library crunched in a few hours, or to cheapest, where it will take a few weeks to process but at a fraction of the cost.
But video encoding, with its 500GB source files is big data, not Big Data.
A closer example of how we're deploying the technologies we've developed is in our Rights Tracker subsidiary where we're able to crunch data on real time programme availabilities.
At its simplest this seems simple. All you're doing is answering the question "Can I sell/show this program ?'. But underlying this is a myriad of complexity.
First of all you need to look at what we call the rights dimensions. Broadly speaking these are the type of rights (eg SVOD or even t-shirt rights for a program brand), territory, language and platform.
But then there are added levels of complexity: time and date based windows, exclusions and dependencies such as contributor payments.
It's all very well selling a view of a video on a mobile phone in Albania for 50p, but what if there's a residual payment of £5 due as a result (believe me, this is more common than you might think).
So, not only do you have to work out the rights availability, but also the rights cost. And as the rates paid for all but the most valuable sports and drama rights is falling like a stone, this calculation becomes central to the very survival of the TV production industry (as well as other related media sectors such as publishing, music, film, sports, etc..). This truly is Big Data.
So, to get back to the theme. Where does this fit in with advertising ? Well, our clients at Group M, part of the world's largest ad agency, are deploying a Rights Tracker system to manage their rights. In the future we hope to be able to directly relate this to the distribution of the properties they have invested in. As a result they will be able to predict and forecast the return they will see on content investment, theoretically even before making an invstment.
This draws a direct line, indeed a marketplace, between creators and advertising revenues. Our aim at TV Everywhere is to provide the missing link, in real time, using massive cloud based processing which will demand billions of calculations a second. We've already built the framework for this (ironically it's the same processing logic that we use for video encoding at scale) and already had the software written. Our challenge now will be to get the parts of the industry who rarely come into contact with each other outside of YouTube, such as content producers and media buyers, to connect with each other.