The FDA's Bayesian Guidance: Learning in Theory, Pre-Specification in Practice
The FDA just gave us 25 pages on Bayesian inference in pivotal trials.
The philosophical message is clear: Bayesian methods are welcome. Informative priors, borrowing from external data, direct interpretation of posterior probabilities—all explicitly endorsed.
The operational message is different: pre-specify everything.
Bayesian methods promise learning. Regulators demand commitment. This guidance tries to reconcile the two, and mostly resolves the tension by prioritizing commitment.
This isn't just about dose-finding anymore
Previous FDA guidance was narrow—devices only (2010) or Bayesian mechanics in service of frequentist inference (2019). This guidance is different:
"The primary focus of this guidance is on the use of Bayesian methods to support primary inference in clinical trials intended to support the effectiveness and safety of drugs."
Primary inference. Pivotal trials. NDAs and BLAs.
The worked examples reinforce this. REBYOTA, the fecal microbiota product approved in 2022, used a Bayesian primary analysis that formally incorporated Phase 2 data into Phase 3. That's not Bayesian-flavored frequentism. That's actual Bayesian inference supporting an approval.
The philosophical split no one's talking about
Section IV describes two frameworks for "success criteria." They sound similar. They're not.
Framework 1: Calibration to Type I error
Choose your posterior probability threshold c such that FWER ≤0.025. Use simulations to find the right c. Report power and Type I error like any frequentist trial.
This is Bayesian machinery driving frequentist conclusions. The posterior probability is a computational convenience, not an epistemic claim.
Framework 2: Direct interpretation
Interpret the posterior probability at face value. If Pr(δ > 0) = 0.98, then—conditional on your prior and data—there's a 2% chance the treatment is ineffective.
This is actual Bayesian inference. Your conclusion depends on your prior being right.
The guidance is explicit that you can't have it both ways:
"With a prior chosen in this way, if the posterior probability Pr(d > a) = c then the probability that the treatment effect is less than a is less than 1 – c."
If you want to claim "98% probability of effectiveness," you're committed to the prior that generated that probability. You can't also claim frequentist error guarantees. The probability calculus doesn't permit both claims simultaneously.
Most sponsors will take Framework 1. It's safer, more familiar, and doesn't require defending your prior as an accurate representation of pre-trial knowledge.
But Framework 2 is where the action is—and where the guidance gets interesting.
Why Type I error inflation is the wrong borrowing metric
Here's the position in the document that will surprise the most people.
Sponsors routinely quantify informative prior influence by simulating Type I error inflation. The logic: if borrowing inflates your Type I error from 2.5% to 4%, that tells you something about how much the prior is doing.
The FDA pushes back. Their language: "not recommended."
Two reasons:
- Philosophical inconsistency. If your prior assumes a non-zero treatment effect, evaluating it under the null hypothesis is incoherent. You're measuring performance in a scenario your prior says is unlikely.
- Dynamic methods mitigate it anyway. Adaptive borrowing approaches reduce influence when prior-data conflict is high, which happens near the null. So Type I error inflation understates borrowing in favorable scenarios and overstates it in unfavorable ones.
The recommended metric: Effective Sample Size (ESS). The guidance defines ESS as "a measure of the information in a probability distribution in terms of the equivalent number of patients in the target population." Interpretable. Auditable. Something a clinical team can actually reason about.
The discounting landscape
Section V walks through the methods for controlling how much external information enters your analysis. The taxonomy:
Static methods borrow a fixed amount regardless of observed data:
- Power priors: raise the external data likelihood to a power < 1
Dynamic methods borrow less when prior and data conflict:
- Commensurate priors / supervised power priors
- Mixture priors (informative + noninformative components)
- Bayesian hierarchical models
- Elastic priors
The guidance doesn't endorse any single method but notes that dynamic approaches often have "more advantageous operating characteristics" because they automatically discount when borrowing would hurt you.
The challenge: dynamic methods have more parameters, and every parameter needs justification. You're trading robustness for documentation burden.
Where this guidance will disappoint you
No clear thresholds. How much borrowing is too much? What ESS ratio is acceptable? The guidance is silent. Every submission becomes a negotiation.
Limited applicability outside pediatrics, rare disease, and platform trials. The worked examples, REBYOTA, empagliflozin, GBM AGILE, all have obvious external data sources. If you're running a standard Phase 3 without clean historical data, the guidance points you toward noninformative priors and calibration to Type I error. Frequentist inference with extra steps.
More documentation, not less. Section VIII runs three pages: prior parameterization with induced distributions, ESS quantification, simulation reports with seed numbers, MCMC convergence diagnostics, sensitivity analyses, prior-data conflict assessment. Heavier than a standard frequentist submission, not lighter.
The pre-specification paradox
I keep coming back to the tension between learning and pre-specification.
Bayesian methods formalize learning. You start with a prior, observe data, update. The posterior is the learning.
But the guidance demands:
"Sponsors should pre-specify and justify the full details of the proposed prior distribution in the protocol."
"The amount of borrowing should be prespecified."
The more thoroughly you pre-specify your learning rules, the less your design can actually learn.
Here's where this gets concrete. Imagine you're using a dynamic borrowing prior that downweights historical data when conflict emerges—but only within a pre-specified conflict threshold. Your adult data suggest a 15% response rate. You set the threshold to discount borrowing if the observed pediatric rate falls outside 10–20%. Then your trial observes 21%. The method maxes out its discounting, but the data are still influenced by a prior that now looks wrong. You followed the rules. The rules didn't save you.
The guidance doesn't solve this. It can't. Pre-specification is a one-way door, and Bayesian learning is what happens after you walk through it.
I wrote last month about what made BATTLE work: the adaptation operated within pre-specified biomarker strata, outcomes were fast enough to matter, and the infrastructure supported real-time updates. BATTLE learned within constraints, not despite them.
The FDA guidance allows for this kind of design—GBM AGILE and Precision Promise are cited approvingly. But the documentation requirements are substantial. I wonder whether operational burden will push sponsors toward simpler designs that call themselves Bayesian but don't meaningfully update on accumulating evidence.
The guidance is philosophically open to learning. Whether that openness survives contact with regulatory timelines is an empirical question.
The takeaway
This guidance is a green light with guardrails.
The FDA is comfortable with Bayesian primary inference—including informative priors, external borrowing, and direct posterior interpretation—when the work is rigorous. But "rigorous" means:
- Pre-specified priors with documented rationale
- ESS quantification (not Type I error inflation)
- Simulations covering prior-data conflict scenarios
- Sensitivity analyses across alternative priors
- Reproducible code with convergence diagnostics
Rare disease, pediatrics, platform trials—these are the use cases with natural external data sources, and this guidance gives them a clear framework. Start early. Engage FDA through the Complex Innovative Trial Design program. Budget time for alignment on the prior.
Standard Phase 3s in therapeutic areas without clean historical data? The path of least resistance remains group sequential designs with frequentist inference—which, as I argued in my last post, is where the real efficiency gains live anyway.
The comment period closes March 13, 2026. Thoughts on ESS vs. Type I error, documentation feasibility, or the pre-specification paradox? This is the time to submit them.
📄 Full guidance: fda.gov/media/190505/download
Docket: FDA-2025-D-3217 via regulations.gov
This post follows In Defense of 50:50 Randomization and What BATTLE Got Right That Most Adaptive Trials Get Wrong.
📬 Want more insights? Subscribe to the newsletter or explore the full archive of Evidence in the Wild for more deep dives into experimental design.
Member discussion