Synthetic Data in Pharma: Why the Math Doesn’t Add Up

Written by Madhumitha Subramanian | May 14, 2026 11:23:18 AM

Every pharma company is evaluating synthetic data right now. Most are asking the same question: how do we use synthetic data to make research cheaper and faster? According to a new position paper from ZoomRx, that is the wrong question and the organizations asking it are likely to be disappointed by what they get back.

The paper, Synthetic Data Is Not Enough, is written by Ty Harkness, Ph.D., Head of Sagan Agents at ZoomRx, and draws on fifteen years of deploying both primary and synthetic methods in the service of biopharma clients. It is not a survey of the field. It is a direct challenge to the premise driving most pharma AI investment right now.

The Cost Argument Has a Problem

The appeal of synthetic data pharma applications is straightforward: replace human respondents with AI-generated personas, cut sample costs, and compress timelines. The cost savings are real. They are also modest.

In a typical pharma market research study, sample costs represent 15–30% of the total project budget. The remainder is professional services: study design, analysis, synthesis, and deliverables. Replace the respondents and you save the sample portion. Everything else is unchanged. That is not transformation, it is a line-item efficiency.

The speed case is more compelling but, according to the paper, still overstated. A traditional MR cycle takes six to twelve weeks. Synthetic respondents answer questions immediately. But synthetic data is not the only path to speed. AI moderation can now conduct deep qualitative interviews with real human respondents at the scale and pace of traditional quantitative research, without sacrificing the behavioral authenticity that synthetic personas cannot provide.

What the Right Question Actually Is

If synthetic data alone is neither the cost play nor the unique speed play it is positioned as, then pharma insights leaders need a different frame entirely. The paper reframes it this way: what kind of intelligence system does an insights function need to make better decisions faster, and where do synthetic methodologies fit within that system?

That reframe changes everything downstream. The paper builds its answer in five sections, working through a framework for when synthetic methods belong and when primary research with real respondents is irreplaceable, borrowing a model from a discipline pharma knows well. It then examines what actually determines the quality of any synthetic output, and why that answer creates a structural asymmetry that most vendor evaluations are missing entirely.

The Two Questions Every Pharma Buyer Should Be Asking

One section of the paper identifies a test that separates credible synthetic platforms from what it calls “AI theater.” Both questions are about data — not the algorithm, not the interface, not the speed of output. The first asks where the model’s behavioral foundation came from. The second asks what happens to that foundation as the market moves. Most pharma organizations evaluating synthetic vendors are not asking either. The paper argues that vendors who cannot answer both with specificity are selling a decaying asset: one that looks accurate at signing and drifts further from reality with every label update, competitive entry, and guideline change that follows.

Synthetic Personas Are a Means, Not the End

The paper makes a case that the most valuable application of synthetic methodology is not faster surveys. It is something further upstream in the decision chain and it requires collapsing what have traditionally been two separate disciplines into a single output.

That argument, and the closed-loop intelligence architecture that makes it operational, is the core of what ZoomRx has built. The paper names it. The framework it describes is not theoretical: it is the design logic behind Sagan Agents, ZoomRx’s AI platform for pharma commercial teams, currently in active deployment with clients across the top 20 global pharma companies.

Download the Whitepaper

Synthetic Data Is Not Enough is available now. If your organization is evaluating synthetic data vendors, has already deployed synthetic methods and is questioning the returns, or is trying to build a case internally for a different AI investment thesis, the framework in this paper provides both the diagnostic and the direction.

Download the whitepaper to get the full argument, the vendor evaluation criteria, and the framework for building an intelligence system that compounds rather than one that resets with every project.

View full post