Will Synthetic Data Take Over Market Research?

Is Synthetic Data Replacing Consumer Research?

Lately interest in synthetic data has been gaining steam, judging from the conversations, posts and discussions around it.  Easier access to advanced modeling tools, improved efficiency and effectiveness, as well as the opportunity for better privacy governance are seen as the driving forces for the surge in its popularity.  Some are even marketing synthetic research not just as a solution but as a replacement for traditional, slower and often expensive research methodologies, presenting it as the faster, cost-effective and modern approach to consumer research.  But is synthetic data indeed the future of market research?  

Image: Darlene Alderson

What Is Synthetic Data?

Put simply, synthetic data is information that wasn’t directly collected from real world consumers or respondents.  Instead, it’s artificially generated data produced by mathematical models or algorithms designed to mimic natural or real-world data.  

It can be “fully synthetic,” meaning it was primarily created by algorithms with little direct connection to real respondents, or “partially synthetic,” where gaps in real data are filled in by AI.  “Augmented” data, perhaps the more popular form of synthetic data, is simulated information built or extrapolated from a foundation of real-world information.  

Aside from the benefits of speed, efficiency, and cost-effectiveness, synthetic data is helpful with various aspects of experimentation, such as preliminary testing, checking hypotheses, stress testing, iteration, and data fusion, even before any data is collected.  It could help improve cases where sample data is small or limited because of difficulties acquiring real data or niche populations.  And with rising expectations when it comes to governing privacy, synthetic data is being perceived as a solution to easily share and analyze sensitive information with lesser risks of identifying respondents.  

Image: Sherin Sam

What’s Keeping Researchers From Embracing Synthetic Data?

While researchers acknowledge the benefits offered by synthetic data and are interested enough to explore the new realms it unlocks, there’s no general feeling of rushing to embrace the new hot tech.  Rather, the push for adopting synthetic data seems to come more from research agencies and their marketing arms, rather than the researchers themselves or even their corporate clients.  

So why aren’t more and more researchers jumping on the prospect of using synthetic data for their studies?  Proponents of synthetic data extol its 80% match rate with real data; however, researchers recognize that that 20% divergence might make or break the research, as it could be where you’ll find the more nuanced opinions, emotion-driven responses, and meaningful differences.  

There’s also the stigma associating the term “synthetic” with “fake.”  There is distinction, however, between synthetic data and fake data, as the former is generated rather than invented like the latter.  Synthetic data draws from real data so it reflects outputs that can be validated, tested, and compared; fake data isn’t afforded the same respect and measure of accountability.  

Understandably, there are concerns about the quality of the data the AI models are fed on.  Poor quality data can lead to oversimplification, overexaggeration, and bias reinforcement.  Perhaps most importantly, researchers are concerned with losing the human element in synthetic data, that disconnect from genuine behavior which is revealed when observing how people naturally- and often spontaneously- express themselves.  Human truths that are deeply tied to cultural, economic, and psychological factors, grounding insights in real-world behavior while elevating them from mere statistical guesswork.  

In addition to AI hallucinations, synthetic data left to iterate by itself eventually produces nonsensical results.  AI models have also been observed to be too eager to please, potentially discounting the opportunity for contrarian responses, unexpected perspectives and uncovering pain points which real participants often provide, potentially leading to groundbreaking insights and discoveries.  

Image: Michelangelo Buonarroti

The Future Of Market Research with Synthetic Data

Synthetic data might be far from the game-changer vendors are hyping it up to be but researchers appreciate having it as another tool at their disposal.  Synthetic research could work if you need to confirm or validate ideas quickly and while working on a budget, but it shouldn’t be expected to produce breakthroughs or unravel deeper levels of understanding the same way natural data does.  It can help improve studies by filling in gaps but these would require validation as well as being transparent to stakeholders regarding the nature of the data behind the results.  

Rather than being a direct replacement, synthetic research could serve study objectives and goals better by complementing, supporting and/or augmenting consumer research.  Synthetic data alone would give everybody the same information, but adding human input and oversight could mean the difference in uncovering resonant insights with a level of confidence that truly drives or influences decisions and actions. 

Additionally, synthetic data is also not a one-and-done solution.  Human behavior and attitudes aren’t fixed and they change over time, so why should synthetic data remain the same and stagnate?  To foster credibility and uphold confidence, synthetic data would require consistent updating and stringent control, as well as be verifiable and reflective of the real world. 

Yes, synthetic data can be powerful, but by itself would falter without that all-important additional layer of humanity.  Market research was, after all, founded on listening to real people, so synthetic data must be anchored in human truths to produce meaningful and relevant insights.  AI-driven market research might be lauded by some as the way of the future, but it won’t spark anywhere near the same level of confidence that synthetic data empowered by human truth inspires.  

Additional Reading:

Can Synthetic Respondents Take Over Surveys?

Trade Talk: Synthetic data: Intriguing, but is anyone actually sold?

Why We Don’t Talk About ‘Synthetic Data’—And Why You Shouldn’t Either

Synthetic data can benefit medical research — but risks must be recognized

When and How “Synthetic Research”– Qualitative Research Among AI-Generated Profiles– Might Be Useful, and its Limitations

Synthetic Data in Market Research: An Expert View on Why Natural Data First Still Wins

Featured and Top Images: cottonbro studio