**How to cut fieldwork time and costs in half while increasing data quality through model-based post-stratification techniques**

*By Fred DeVeaux and Korbinian Oswald*

**Introduction**

Dalia’s online sampling technology taps into public opinion from around the world. Instead of using a panel with a fixed number of respondents, Dalia continuously sources new survey respondents from tens of thousands of online apps and mobile websites. This enables Dalia to reach hundreds of millions of people in more than 90 countries in real-time and uncover public sentiment in countries where access to public opinion remains largely limited by traditional survey means (eg. landline, face-to-face interviews) or online panels.

**The Problem**

The biggest challenge in the sampling industry is to collect information from people in a way that accurately represents the population, in order to be able to draw generalizable conclusions. In most cases, when the population of interest is the entire country, such samples are called “nationally representative”. Online sampling has become the dominant approach to reach the largest number of respondents in an efficient way, and has largely replaced non-online methods (e.g. face-to-face, landline telephone). Though online sampling is technically a “non-random” process because it can’t reach people without internet, it offers reasonable approximations to statistical representativeness with a commonly-used approach called ‘quota sampling’.

Quota sampling builds a representative sample by dividing up the population into groups (usually based on demographics such as age, gender, geography, income and education). Survey researchers need to reach enough respondents in each of these groups before they can stop fieldwork. These targets are called quotas, and they ensure that the completed sample distribution matches the general population’s official distribution (based on census statistics). But because some of the groups are much harder to reach than others (i.e. older, less educated respondents), most of the fieldwork time and costs are spent trying to reach these difficult respondents, even after the large majority of the sample is already complete. Therefore quota sampling is often time-consuming and costly, particularly in large multinational studies where certain target groups are more difficult to reach online.

**Results**

In this paper we explore a method to estimate responses from a sample called “Multilevel Regression and Poststratification” (MRP). This method reduces dependence on quotas, and therefore reduces the corresponding high costs and length of fieldwork time associated with quota sampling.

When applied to Dalia’s “Risk Pulse”, a monthly survey in 6 countries, MRP generates comparable survey estimates to quota sampling, but drastically cuts costs by 55% and fieldwork time by 98% – from 15 days to 7 hours. The results offer promising opportunities for the online sampling industry: not only does MRP reduce time and fieldwork, it also opens the doors for more flexible sampling techniques and sophisticated models that include other variables more relevant than the typical age, gender, education demographics.

**Problem with Quota Sampling**

To illustrate what a typical quota sampling setup looks like, and why it is often problematic, consider Dalia’s project called the Risk Pulse. This is a monthly recurring survey that collects nationally representative samples of 500 respondents from 6 different emerging economies in order to monitor political developments and social unrest.

Below is the table of quotas required to fill a nationally representative sample of ~500 respondents from Venezuela based on standard age, gender and education groups. In other words, fieldwork is finished when the survey has reached the target quotas for each of the “cells” below.

*Example: it is necessary to collect responses from approximately 12 high educated, 14-25 year-old male respondents in Venezuela. *

Quota sampling works well, particularly in countries where it is relatively easy to reach different types of people (North America, Europe), because each cell might fill up at around the same rate, and within a few days of fieldwork the sampling is done.

**Biased Sample**

Dalia’s Risk Pulse survey, however, is in emerging economies where access to the population through internet-connected devices is less balanced. It is no surprise that people with internet access tend to be younger, higher-educated, and wealthier than those without (Pew link). Therefore, the same is true for Dalia’s reach: respondents tend to be higher educated, younger and more likely to be male than the national population.

Here quota sampling faces the following problem: because some parts of the population are much harder to reach than others, the quota cells fill up at much different rates.

In the end, a majority of the fieldwork time and costs are spent on filling up the few problematic cells, which only represent a small fraction of the sample. Even though it only takes Dalia about two weeks to fill up the required quotas for the 6 different countries, almost all respondents are (90%+) reached within the first few days.

Furthermore, after about 5 days, we are able to achieve the required 3000 respondent sample size (500 for each country). But some of the cells are not sufficiently filled (while others are slightly overfilled), so the next week or two is spent almost entirely on trying to get the last few respondents who belong to specific demographic cells that are harder to fill. This means that the majority of the costs, fieldwork time and effort come from sampling only a small fraction of the respondents.

**Multilevel-regression and Post-stratification (MRP)**

MRP is an estimation method that attempts to overcome the difficulties of biased samples. Because MRP does not rely on quotas to make nationally representative survey estimates, it can drastically reduce fieldwork time and cost.

This method showed promising results in a 2014 publication of the paper “Forecasting elections with non-representative polls” in the International Journal of Forecasting. In their paper, Wang et al. used MRP to adjust a heavily skewed sample drawn on the XBox gaming platform (93% men, 65% age 18-29) asking for daily voter intention for the 2012 US presidential election. The authors showed that it is possible to generate very accurate election forecasts from non-representative polls. By using MRP they produced election forecasts that matched leading poll analysts, but at a fraction of the time and cost.

**How MRP works**

To demonstrate how MRP works, let’s illustrate the approach with a simple comparison to the quota sampling technique. Imagine that there are four different demographic groups in a country and that they are all of equal size. In order to get a nationally representative sample of 100 respondents using quota sampling, we set the quotas for 25 respondents for each of the four groups. After a few days of fieldwork, we are able to get enough respondents from each group:

But imagine one cell is particularly hard to reach (females 40-65), and after a few days of fieldwork we are only able to get 5 respondents in this cell.

In order to find out how a certain demographic group answers a survey question, we take the respondents from that group’s cell and average their responses. With so few respondents in this problematic cell, the answers of those 5 people risk carrying too much weight, making it necessary to continue fieldwork and fill the cell with more respondents. If the respondents are very difficult to reach, then this could take quite a lot of time and money. This is a problem that Dalia and the broader sampling industry encounter on a daily basis.

With the MRP method however, we can avoid this problem because it is** not strictly necessary to fill up all problematic cells****. **MRP can still generate accurate survey estimates with underfilled cells because it uses a regression model based on the entire sample.

In order to estimate what percent of 40-65 year-old females respond “Yes” to a question using MRP, we take all the respondents from the sample and run a regression. The final regression is a model that predicts whether a respondent says “Yes” to the question based on his / her age and gender. In this way, it uses all the respondents from all the cells to calculate the average effect of age and gender on the response.

This is called “borrowing strength from cell neighbours”. In our example, we would be able to estimate the response from females aged 40-65 by finding the average effect of being female (based on all 50 females) and the average effect of being 40-65 (based on all 40-65 year-old respondents). The strength of the estimate is now based on 45 + 5 + 25 respondents, instead of just the 5 females in the 40-65 age cell.

**Testing MRP **

Dalia’s Risk Pulse encounters the quota sampling problem outlined above. Namely, most of the cost and time used for sampling is required to fill up some problematic cells.

So we tested the MRP method on Dalia’s Risk Pulse to try to avoid this problem. As a first baseline model, we used a single-level logistic regression. We set out to answer the following question: is it possible to generate accurate survey estimates using MRP on a sample where the problematic cells are not filled? To answer this question, we compared the results generated by the MRP on a reduced sample to the results generated by quota sampling on a complete – and more expensive – sample.

One of the questions in Dalia’s Risk Pulse survey is: *“In the last 12 months, have you or someone close to you (friends, family) personally experienced or witnessed an act of corruption (e.g. a bribe, fraud)?* Using the full quota sample with all quotas sufficiently filled, the overall answer to this question is that 42% of people from these 6 countries have experienced corruption. The fieldwork required to fill all the quota cells sufficiently and to get the required 3214 respondents was approximately 15 days.

To test whether MRP works, we use this 42% estimate as the benchmark. The research question is: If we used MRP, how much earlier could we have stopped our fieldwork, while still ensuring that our estimate for corruption is *not too far off* the 42% benchmark?

First, we set our threshold for error at 1%, meaning that we wanted to find out how many respondents were required for MRP to generate estimates that are less than 1% different from the quota sampling benchmark (42%).

The result is that only the first 1443 respondents are required in order to generate an estimate that is no more than 1% different from the benchmark. In other words, we could could have stopped fieldwork at 1443 respondents instead of the 3214 required for quota sampling and still have gotten very close results for the corruption question.

In regards to time and costs, this translates to a 55% reduction in costs and 98% reduction in fieldwork time (from 15 days to 7 hours).

To see how far we could reduce the sample size we increased our threshold for error to a 2% and 5% deviation from the benchmark. The resulting acceptable sample sizes were 1212 respondents and 1032 respondents, respectively.

In regards to fieldwork time and costs, the benefits are now only marginal. This suggests that there is an optimal limit to how far the sample can be reduced, and that the majority of the benefits come from the initial reduction.

In the end, the striking results show that with as few as 1443 respondents, we could generate survey estimates within 1 percentage point of the quota sampling results. The differences in time and costs are drastic: fieldwork time required for the first 1443 respondents was 7 hours instead of more than 15 days for the 3214 respondents using quota sampling. Concerning costs per interview, this would have cut our costs by more than 50%.

**Limitations**

It is important to notice that the MRP method also comes with limitations.

First, though MRP doesn’t require filling up specific demographic cells, the data still needs to have respondents from each major variable used in the model. For example, MRP doesn’t require filling up a demographic cell of 40-65 year-old females, but to build a model based on gender and age it does require a certain number of females and a certain number of 40-65 year-olds in the sample.

Another limitation with MRP is that it changes the way the data can be analyzed. Normally with quota sampling, the typical delivery is a dataset, or an analysis based on the raw data, which includes a weight calculated for each respondent. This type of dataset encourages exploratory analysis, because it is possible to look at estimates for different segmentations or sub-groups (e.g. how does car owners opinion of Tesla differ from non-car owners?). This is useful because flexible segmentation is a very important tool for generating insights from survey data.

With MRP however, the survey estimates for each question are based on a model that comes from the relevant sample. That means it’s not possible to use the raw dataset on it’s own to explore the survey results for all the different questions or for all the different population segmentations. Instead, the final delivery shows only the final results of the MRP calculations, for each desired question and for each desired segment. Therefore, MRP is most effective when the key insights are defined on the onset, and minimal exploratory analysis is required.

**Conclusion: Implications of MRP**

The results of our initial test are quite promising, and open the doors to many new approaches to further improve MRP’s performance.

**Improvements**

**Build better models:**

MRP estimates are only as good as the model they use. In this test we used a simple logistic regression. For future improvements we will explore other, more advanced machine learning models such as neural networks and multi-level models to find the optimal option.

**Interaction terms:**

Many of these machine learning models use interaction terms between variables. Such interaction terms would help estimate the combined effect of two variables, instead of simply considering them as separate. For example, in addition to measuring the individual effects that being aged 40-65 being female have on an answer option, an interaction term between age and gender would also measure the effect of being both aged 40-65 and female. This could significantly improve the model’s ability to capture real life combinations of factors that influence people’s opinions.

**New variables:**

We expect these models to be much better when we introduce new variables: Instead of relying only on basic demographics such as age, gender, and education to predict all kinds of survey results, MRP models can include topic-specific variables because it is no longer restricted by the need to fill all the cross quotas. Therefore, when predicting purchase considerations for brands, for example, the addition of variables such as Early Tech Adoption, or Brand Awareness will help ensure that the estimates are representative for the most relevant population segmentations that have the highest chance of influencing the survey responses.

**Applications:**

*Once we’ve explored ways to improve the MRP models, we can start applying MRP to a whole variety of important use cases:*

**Tracking studies:**

MRP makes it much easier to keep the **sample composition stable overtime**. This is a key feature for studies that measure trends, and is currently a significant pain point for quota sampling because it requires that all cells are sufficiently filled and that they are filled similarly across all waves in order to keep the weights stable. MRP only needs enough people in each major variable category, so it is much easier to guarantee a stable sample composition.

**Niche Research:**

MRP’s key value is that it can deal with very skewed samples, making it a particularly promising tool for conducting survey research in traditionally difficult to reach markets or among niche population segments. This opens many doors for commercial applications, such as surveys among specific target groups (high-earning business owners, frequent travellers etc).

**More accurate Market Measurement:**

Lastly, because MRP allows for the inclusion of more variables into the post-stratification process, survey estimates are more likely to be representative for key variables. It remains to be seen, but the adoption of new and more relevant variables might improve the accuracy of MRP estimates. If so, then MRP could be a useful tool to improve the existing market measurement data based on quota sampling.

*

* Conclusion*We’re excited about the prospects of using MRP for more accurate and more efficient sampling in the near future. With our CEO’s presentation of this MRP paper at the Samplecon conference, we hope to further contribute to the sampling industry methodology and connect with other people interested in MRP and its applications. For any questions or comments related to Dalia’s MRP research, please feel free to contact us!