Data Quality 101: Bots and Fraud

Intellicast
Intellicast S4E6 – Pivoting to Success Out of Crisis With Colson Steber
February 8, 2021
Intellicast
Intellicast S4E7 – A 2021 MRX Conference Preview
February 12, 2021
Show all

Data Quality 101: Bots and Fraud

101 theme with young man in the night

A favorite topic among the Research Management team at EMI is data quality. We could spend hours debating duplicate statistics, fraud blocks, and post-survey removals. There are so many facets to the issue, and for those of us who classify as ‘data nerds,’ we take continuing education around fraud very seriously. It’s easy to get buried in information, but it boils down to a few key topics:

What are duplicates?

In short form, these are respondents who attempt to access the same survey multiple times. A respondent may not know they have already participated via another access point! Our technology tracks duplicates across a proprietary fingerprinting software, as well as a licensed version, and immediately sends them to an end page indicating they have already participated. At EMI, we focus on reviewing this type of data regularly across device type, country, supplier type, target group, etc. Blocking duplicates is considered the base level of security within our platform, and arguably the most straightforward quality measure.

What are bots?

Generally speaking, a bot is a software application that runs an automated task over the Internet. Typically, bots perform tasks that are both simple and structurally repetitive at a much higher rate than would be possible for a human alone. Some bots are good—they can improve customer service (example: Chatbots), increase a company’s web profile (SEO), and provide relevant search results for Internet users worldwide. However, most bots are malicious and unregulated (examples: click bots fraudulently clicking on ads or spambots on social media sites.)

Why do bots have an impact in our world?

Bots have an impact across all industries (financial services, ticketing, retail, and education); Market Research is not alone. A batch of fraudulent data can very quickly impact business decisions – the good news is that the perception is greater than the reality, and fraud detection software continues to develop. For example, over the past month, EMI has blocked approximately 5% of our total activity as fraud. While bots are certainly in the mix, we can confidently say we are not seeing malicious activity on a large scale. 

Why do we experience bad data?

We are quick to blame a bot for bad data, but poor quality is not just because of bots. Survey design, survey length, survey fatigue/unengaged respondents, and overall sample selection play a much larger role than bots in post-survey quality removals. EMI combines several layers to focus on mitigating risk and improving quality.

EMI’s Approach to Data Quality

The first piece of this puzzle is technology. Our proprietary project management platform combines Geo-IP tracking, digital fingerprinting, and fraud detection, offering a multi-layer approach to pre-survey blocks. Once in-field, each job is tracked with panel sources within each job that are monitored individually. This allows sub-quotas by partner and the ability to quickly identify a potential issue with activity as well as what source it may be isolated to. If a problem is identified, we investigate further right away for real-time adjustments to the screener or questionnaire and re-field for replacement data as needed. Our team also works with clients to set up data scrubbing options throughout fielding, ensuring no surprises at project close. It is important to understand removal reasons– inconsistent answers, duplicate open ends, patterns across data points– the more detail the better. We share quality information with our sources, keeping data quality discussions open while we review respondent level account data. Operationally, data quality removals are uploaded and stored in our project management platform for ongoing reporting and tracking.

Overall, there is not a part of our day that isn’t focused on providing the highest quality data possible to every study we field. It is a process that is continuously evolving and continuously improving. With a better blend of technology and human touch, we promise to keep our eyes on all aspects of quality sample.