In Search Of Big Data

The term 'Big Data' is another one of these ubiquitous monickers that seems to appear everywhere these days.

Of course, if you've delt with video delivery over the internet, it's always been about 'big data' and the off the shelf capabilities to deal with this have become easier as more and more technology has become available at lower price points, but scaling data storage is still one of the few things that doesn't follow the rule of diminishing pricing. Indeed, it gets more expensive as you scale.

'Big data' has a few other attributes that pose a challenge:
  • large file sizes
  • large number of entries
  • need to process large number of records
  • record sets with countless dimensions
With the arrival of large scale online sites with the need to update and process vast amounts of data in real time, new techniques have appeared, not least the development of document, index and graph databases in place of traditional databases.

But, these technologies work well only in certain contexts.

Comparing two of our companies, VidZapper and Rights Tracker gives you an indication of the problems anyone tackling this area in the media industry will face. VidZapper deals with very large files and a huge number of associated data. However, much of this data is flat and rarely changes, eg video metadata.

Rights Tracker, on the other hand, deals with massive queries: for example working out the availability of a TV programme based on language, territories, dates, format, platform, network and the type of rights. This involves a hug amount of processing and demands a totally different technical and programmatic approach to presenting a large amount of flat metadata about the same programme.

So, 'big data' is a big problem that won't be solved by hardware or the cloud: it needs to be solved by design and technical architecture, and more than anything by selecting the right technological approach.