Data sushi or why everyone just wants the raw data

Data sushi or why everyone just wants the raw data

Tom Weiss, Wed 15 January 2020

When I first started doing business in Japan in the mid 1990s I fell in love with sushi. Sitting at the counter eating sushi and drinking sake with my friends felt like the most glamorous thing in the world.

Now, when I'm talking to clients and hear them asking about access to the "raw" data I wonder if they have the same infatuation I had with raw fish. It took me a good ten years to re-acclimatise myself to the pleasures of cooked fish and I think I see the same thing happening in the world of TV data.

When people talk about raw data, it usually comes from a desire to not be constrained by what they want to do with the data. With raw data, it's assumed that anything is possible, and it can be processed exactly as required. That is certainly true, but if you're asking for the "raw" data you also need to be very clear on what you are really asking for.

It's really easy to make sushi, right?

In video, raw data is frequently little more than click-stream information from the remote control (set-top box) data, or a list of frames that match an ingested stream (Smart TV data).

This data is messy and complex, and like a visitor to a restaurant who wants more than a raw fish dumped on his or her plate, most agencies or networks probably don’t want to be ingesting that kind of raw data.

Most people, when they request “raw” data mean that they want a granular, sanitised dataset that they can drill down to the level of an individual device. They want any inconsistencies in the data smoothed out, they want the population to have been made representative of the market they are measuring, and they want data they can trust, and that won’t make them look silly in front of clients.

In other words, they want the opposite of raw data. They want finely prepared fish. There are three key steps to make your data sushi appetising and palatable, as follows:

Step 1: Smoothing out the inconsistencies

Raw data is full of common inconsistencies. Set-top-boxes that occasionally report viewing from the 1st January 1970. Fast-forward or rewind session data is often missing. In Smart TV data, if the same content is on two networks at once, the ACR algorithm can sometimes flip-flop between the two. OTT data can have inconsistences between iOS and Android. This is just a small sample of common issues and and any dataset needs to go through a rigorous cleansing process to remove these erroneous instances.

Step 2: Make it representative

Set-top-box data typically is only sourced from an MVPD’s footprint. This needs to be modeled for by applying it to the rest of the country, and any skews in that footprint need to be removed. With more and more people using antennas to get terrestrial broadcast, STB data is already going to be missing large chunks of the population - increasingly in its footprint, as well as outside it.

Smart TV is more representative as it covers cable, satellite and broadcast, but it’s only going to cover homes where people have bought a new TV in the last few years and where they’ve connected to the internet through wi-fi. That’s still going to skew the data towards more prosperous and younger people.

Even OTT data-which, by its nature, is representative of every device using that service, needs to be ‘de-skewed’ by eliminating test data and bot views.

Step 3: Make sure you avoid food poisoning

Finally, you need to be sure that the data is reliable. Nielsen is often rapped for being late in delivering its data. Late is not the worst data can be: data providers frequently have outages and you need resilience in your data strategy to cope with this.

No one wants food poisoning. When a power outage in Florida delayed the release of Nielsen ratings by several days back in 2017, people complained, but it was eventually published. If you're taking data to your client claiming that nobody watched their competitor’s commercial, you had better be sure there wasn’t an outage at your data provider during that period. It’s better that you wait for your sushi and get it in perfect condition than getting it on time but far from fresh.

Leave raw data to the sushi masters

That sashimi on your plate may look delightfully simple and elegant, but a lot has gone into getting it there. It takes five years of hard work to train to be a sushi chef, and creating data sushi is no less complicated.

Unless you have an army of fully-trained sushi chefs on staff, I'd recommend you ask for the processed data from your data providers, and to leave the raw data to the data masters.

Need help? Get in touch...

Sign up below and one of our data consultants will get right back to you

Other articles about Data Science

Deductive is a global consulting firm providing data consulting and engineering services to companies that want to build and implement strategies to put data to work. We work with primary data generators, businesses harvesting their own internal data, data-centric service providers, data brokers, agencies, media buyers and media sellers.

145 Marina Boulevard
San Rafael
California - 94901
+1 (415) 843 1774

Registered in Delaware

Thames Tower
Station Road

Registered in England & Wales, number 8170657