Data Quality 101: De-Duplication

Intellicast
Intellicast S4E11 – New Products, New Partnerships
March 16, 2021
ESOMAR 28 Updates: What They Mean and Why You Should Care – Part 2 – Sample Sources and Recruiting
April 5, 2021
Show all

Data Quality 101: De-Duplication

101 theme with young man in the night

Now more than ever, using multiple sample sources is the best practice in online quantitative research. In the past, it was often thought that everything had to stay with a single source to maintain consistency. Since then, this theory has been debunked.

In today’s online sample industry, blending sample from multiple sources is the preferred method. However, whether you’re blending, aggregating, or stacking sample—they all face challenges.

One of these challenges is duplicates. Duplicates are respondents who attempt to access the same survey multiple times. Many people are members of different panels, meaning the same person could be answering your survey multiple times from different sources whether they know it or not! This can certainly have an impact on survey results, skewing data.

For this reason, it’s important to implement a rigorous de-duplication process. We consider blocking duplicates to be the base-level of security on our platform and one of the most straight-forward data quality measures.

A de-duplication process should be able to encompass several layers involved with aggregating multiple sample sources, including looking at individual partners within one project as well as across groups of projects to de-duplicate. At EMI, we focus on reviewing duplicate data regularly across device type, country, supplier type, target group, etc., to identify trends and overlap across specific audiences.

We use SWIFT, our proprietary, cloud-based, sample management platform, to ensure the highest quality while sample blending. SWIFT has unique capabilities including industry-leading digital fingerprinting software to remove any duplicates in your data, immediately sending them to an end-page that indicates they have already participated in this study.

Data quality is a process with many factors and it’s constantly evolving, much like the sample industry itself. That’s why we’re always focused on continually gaining knowledge on best practices to ensure the highest quality data.

Check out the previous blogs in our Data Quality 101 Series:

Data Quality 101: Bots and Fraud