What Is Synthetic User Testing?

Synthetic user testing runs AI personas through your research questions or your live interface to surface friction fast. Here is what it does well, where it breaks, and how to use it without fooling yourself.

Bretton Badenoch · AI researcher, University of Michigan · Founder, CanaryUsers··4 min read

Synthetic user testing uses AI personas, built from real behavioral data, to stand in for human participants. You point them at a research question or a live interface, and they respond the way a sample of users might. It is fast and cheap. It also reflects training data rather than real people, so it earns its keep in early exploration, not final decisions.

What is synthetic user testing, exactly?

A synthetic user is an AI-generated profile that mimics a user group and produces research findings without studying real people, as Nielsen Norman Group defines it. Synthetic user testing is the practice of putting those profiles to work.

It shows up in two forms. The first is research-style: AI personas answer interview or survey questions so you can pressure-test a concept before recruiting anyone. The second is interface-style: AI agents actually navigate your product, attempt a task, and report where they hesitate or fail. Both aim at the same thing, which is feedback before you spend weeks recruiting and scheduling humans.

How accurate are synthetic users?

More accurate than skeptics expect, with a sharp asterisk. In a 2024 study, researchers from Stanford, Washington University, and Google DeepMind built generative agents from two-hour interviews with 1,052 real people. The agents replicated those individuals' answers on the General Social Survey about 85% as accurately as the people replicated their own answers two weeks later.

The asterisk is the input. That accuracy came from rich, first-person interviews. When agents are built from thin demographic descriptions alone, such as age, gender, and location, accuracy falls and bias across groups rises. The model is only as good as what it knows about the person it imitates.

Where does synthetic user testing fall short?

It tends to be too agreeable, and too capable. AI personas often give optimistic, shallow feedback that confirms what you hoped to hear, and they rarely surprise you with the odd, off-script behavior that real testing exists to catch.

A Userbrain experiment makes the gap concrete. Asked to find the cheapest plan for 20 users a month, all five synthetic users found the right answer, while only three of five real participants did. The synthetic users pushed through friction that stopped real people. That is backward from what you want a test to reveal, because the friction is the finding.

When should you use synthetic user testing?

Use it to prepare, not to decide. Nielsen Norman Group endorses one solid use case, desk research: synthesizing what is already known about a group to generate hypotheses you then test with humans. Good fits include piloting an interview guide, drafting proto-personas, and learning a new audience before your first real session.

NN/g is blunt about the limits. Do not use synthetic research to replace real-user research, validate a concept, or make a final call, and be careful with niche populations where training data is thin. The bigger organizational risk is comfort: teams that start synthetic often never move on to real users.

What makes a synthetic test trustworthy?

Three things separate a useful synthetic run from a misleading one. First, the persona is built from real behavioral or interview data, not invented traits. Second, the task is concrete and logic-driven, since AI handles a pricing comparison far better than an open-ended emotional journey. Third, the output gets checked against at least a few real users before anyone acts on it. Miss any of the three and you are reading fiction in a confident voice.

How does it compare to traditional usability testing?

Traditional testing is slower but grounded in real behavior. Jakob Nielsen's well-known finding is that just five users uncover about 85% of an interface's usability problems, which makes small human studies remarkably efficient. Synthetic testing trades that grounding for speed and volume.

Dimension Synthetic user testing Traditional usability testing
Speed Minutes to hours Days to weeks
Cost per run Very low Recruiting plus incentives
Grounded in real behavior No, reflects training data Yes
Catches surprises Rarely Often, the main point
Best use Hypotheses, prep, early screens Validation, real decisions

How do you run synthetic testing without fooling yourself?

Treat it as the first draft of research, then check it. Periodically run the same task with a handful of real users and compare. Use synthetic runs to sharpen your questions and narrow what is worth a human's time, and never ship a decision that rests on synthetic feedback alone.

The interface-style version is where this gets practical for shipping teams. CanaryUsers sends a flock of lifelike AI users through your deployed app and reports where they stall, get confused, or abandon a flow, each finding paired with a concrete fix. It is most useful when you have no traffic to learn from yet and no budget to recruit. You still validate the big calls with real people, but you catch the obvious breakage first. run a free scan

Frequently asked questions

Can synthetic users replace real user research?

No. Nielsen Norman Group advises against using synthetic research to replace real-user research, validate a concept, or make a final decision. It works as preparation for human testing, not a substitute for it.

What data makes synthetic users accurate?

Rich, first-person input. Stanford's agents reached about 85% relative accuracy when built from two-hour interviews, but agents built from demographics alone perform far worse and carry more bias.

Is synthetic user testing the same as A/B testing?

No. A/B testing measures how real visitors behave on live variants of a page. Synthetic testing simulates user behavior before you have that traffic, which is why the two are complementary rather than interchangeable.

How many synthetic users should you run?

Enough to spot patterns, though more runs do not buy real-world validity. Five real users already surface roughly 85% of usability problems, so use synthetic volume for breadth and a few humans for ground truth.

Keep reading

Sources

Bretton Badenoch

Written by

Bretton Badenoch

AI researcher, University of Michigan · Founder, CanaryUsers

Bretton Badenoch is an AI researcher at the University of Michigan and the founder of CanaryUsers. His research is in machine learning and aging; he has also built and run several startups as "chief-everything-officer," shipping products and obsessing over why users drop off, the problem CanaryUsers now automates.