A growing number of brands are turning to synthetic data, AI-generated data meant to mimic the real world, to learn about their customers.
Interest in the AI application is only growing. A Qualtrics report from May found that 41% of market researchers are currently using synthetic data to supplement or replace human respondents, and 62% of respondents say they intend to use it.
“There's been a lot of interest and enthusiasm on the part of my clients, and trying to understand how synthetic data could benefit the work that they do,” said Lizzy Foo Kune, a distinguished VP analyst at Gartner, who co-leads the Gartner Futures Lab. “And then on the other hand, there's a lot of skepticism around the validity of the output.”
CX teams first need to understand synthetic data, its use cases and limitations before launching headfirst into synthetic data analysis.
Synthetic customer data comes in two main buckets, according to Andy Pierce, a member of Bain & Company’s customer strategy and marketing practice and the firm's global lead for value proposition innovation and design. The first is digital twins; the second is cohort analysis.
Pierce provided the example of a brand with three individuals' information in its database. It can create a digital copy — or digital twin — of each of them that it can use to better understand and predict each customer’s behavior.
These dynamic, virtual representations mimic and predict customers’ behaviors. They pull on first-party data and other consumer data sources to replicate each person’s experiences in a digital environment. Digital twins allow companies to get granular and provide one-to-one level personalization.
Brands that want to look at the marketplace more broadly can do a synthetic cohort analysis. This is best used for exploring how new products or services would be received. In this case, the brand can take research data, transaction data, first-party proprietary data, social listening and social scraping, product reviews and more to create different customer segments. Those three individuals used in the digital twin example would each fall within a different cohort.
Synthetic data and survey data largely mirror each other, and accuracy is improving, Pierce says.
“The accuracy just in the digital twin world is up to 90% or even higher,” Pierce said. “It's lower in the cohort world because you just don't know everything about the customer, but we've seen results up as high as 80%, 85%, so they're getting quite accurate.”
Still, it’s not foolproof. But there are other reasons why a brand might pursue synthetic data over more traditional surveys, including the cost and time saved as well as the ability to learn more about specific customer groups, according to Foo Kune.
“The reason you might do this is because you're trying to pursue rapid time to market, so in some cases, data that could take months to collect through more traditional survey methods could be attained in days or in hours if you use the synthetic sources or a synthetic focus group,” she said.
It’s also for hard-to-reach audiences.
“If you're a B2B organization, and you need to put together a focus group of Fortune 100 CIOs, traditionally it would be pretty time and resource-intensive to do that,” Kune said. “You could use the synthetic personas, the synthetic focus groups to substitute for that.”