Academy

What different types of synthetic data are there?

Poster image for What different types of synthetic data are there?Poster image for What different types of synthetic data are there?

About

In this episode of Talking AI, Ray and Will Poynter, co-founders of ResearchWiseAI, discuss the practical uses, challenges, and future of synthetic data in market research. Synthetic data is defined as data created rather than collected, with Ray clarifying three main categories currently relevant to the market research industry:

  1. Augmented Synthetic Data: The most common approach in market research, augmenting existing survey data by adding synthetic cases to fill gaps, improving upon traditional weighting methods. It is valuable for reducing cost and time, particularly addressing the last 10-20% of data collection that typically incurs the greatest expense.
  2. Personas: Qualitative or quantitative synthetic entities representing customer segments or groups, such as loyalists or trialists. These are interactive personas that help brand managers generate insights, ideas, and strategies through simulated dialogue and creative brainstorming.
  3. Fully Synthetic Data: Entire datasets created synthetically, bypassing traditional data collection entirely. Though not yet widespread due to concerns about efficacy and trust, this method offers significant potential for privacy protection and rapid analysis.

Ray and Will highlight practical advantages of synthetic data, such as faster turnaround times, reduced costs, and enhanced data security and privacy. Synthetic data originated partly to address privacy issues—like adding noise to census data—and continues to offer strong security benefits by replacing sensitive personal information.

However, concerns about synthetic data persist. Ray emphasizes the primary industry worries include questions about accuracy, validation methods, and reliability across different contexts. The lack of standardized validation techniques to assess synthetic data accuracy remains a critical hurdle. Ray advises that validation should ultimately focus on whether synthetic data supports effective business decisions.

Discussing future trends, Ray and Will predict significant growth for augmented synthetic data and interactive personas, driven by increased industry acceptance and regulatory clarity. They foresee augmented data increasingly replacing traditional weighting, while personas evolve into dynamic tools allowing brands to simulate interactions with target audiences in real-time.

While fully synthetic data may face limitations, especially if overused without fresh data collection, Ray suggests it could eventually eliminate traditional surveys by directly leveraging AI’s deep understanding of consumer behavior and business questions. However, this approach might become obsolete if AI systems reach a point where they directly generate insights without needing traditional survey structures at all.

To conclude, Ray and Will encourage careful, validated adoption of synthetic data approaches, underscoring their potential to transform market research by speeding processes, enhancing privacy, and generating richer, more actionable insights.

Tune in next week for another episode of Talking AI.

Presenters

Ray Poynter

Ray Poynter

Founder

Will Poynter

Will Poynter

Founder