Historically, we've measured TV viewing at the person and household levels. Panel providers connect people meters to every set in the house, and Cable operators have typically had set-top-boxes connected to each TV.
However, with the fragmentation of viewing, the number of TV-like screens available in a typical household has ballooned. Any STB or ACR dataset is now unlikely to capture views from every TV in the home.
On average, for every device used to watch TV in a home that our data science teams might know about, there are frequently another 1.5 that they don't see.
Simple Scaling Factors
If we can be confident that our data represents a uniform distribution of devices within the home, we can still project out the household level ratings from the device level data by applying a simple scaling factor.
The most common method to project household reach of a TV campaign is to identify the number of known TVs and multiply by missing proportion. If we have one device per home and we know there should be 2.5, we scale by a factor of 2.5. For example, a campaign that reaches 100,000 TVs can have another 150,000 that weren't measured, making for a total of 250,000 exposed homes.
While this approach works well for small campaigns, when we have high-frequency, we can get an overall campaign reach greater than the size of our population. Consider a campaign that reached every home in the country. If the total number of households is then multiplied by 2.5, this would mean 300 million households exposed in a universe of 120 million households. We needed to improve the model.
Creating a Better Model
Looking at simple examples, it's clear that HH and device data does not scale linearly, as we can see in the below example, where 20% of devices (6 out of 30) and 50% of homes (5 out of 10) view a campaign in a universe of one million homes. There were between 1.5 and 3.5 devices per household calibrating the transformation.
In situations where reach is high, we evidentially need a much lower scaler factor than when the campaign is smaller. To understand this better, we created a model that simulated running ad campaigns. We assumed a universe of 1 million homes, and each household had between one and four TV-watching devices, which roughly fits the idea the team wanted to investigate of multiplying the number of known TVs by 2.5. The goal was to work out both how many TVs were exposed and how many households were exposed, i.e., device and household reach.
We've shared the code online here.
The simulation results are fascinating. We see consistent results across all campaigns, with the only variability being in the number of devices per home.
If we plot the same data as a scaling factor, we see the transformation as an exponential decay. For a measured proportion of devices [d] from a universe of [m] devices per household, and here we calculate the corresponding ratio of homes [h] by:
This scaling factor can be calculated using the formula m×e^(-d×ln(m) ) where d is the proportion of devices and we have an average of m devices per household in the universe of all devices.
Applying this to TV Ratings
We can apply this scaling factor to reach and rating calculation to transform data calculated at the device level to household level ratings. For this, we need two additional data points: the number of devices per household in the dataset we are using, and the expected number of sets per home across the country. Working with Inscape data, we have 1.1 televisions per household, and the average across the country is approximately 2.5.
We assume that the distribution of televisions within a given household will be the same for Inscape TVs as the national market-up, and we can apply the following formulae.
As ratings are the average % of viewers, we need to apply the exponential decay to transform from device to household:
As reach is the total % of viewers, we need to apply the exponential decay curve:
We calculate the total reach by multiplying the % reach by the whole audience:
Impressions are not a % of total measure, and we only need to scale by the difference between the actual TVs per home and the expected TVs per home:
Frequency is more complicated to scale. We calculate it by dividing the total scaled impressions by the scaled total reach.
The current approach assumes we have a uniform distribution of devices within the home, but we could weight devices according to their position in the home. This approach would remove biases that connected devices tend to be the primary viewing set in the house, and we might miss morning viewing from TVs in the kitchen, for example.
We are also not currently taking into account variations in the number of TV sets per home across the country. Integrating market research on the number of TVs in each market would add a further level of detail to the projections.