Calibrated Bayes: The Framework You’re Already Using
There’s a question that surfaces in almost every Bayesian trial design meeting, usually from someone in regulatory affairs:
“But what about Type I error?”
The Bayesian purist has an answer. Under the likelihood principle, the stopping rule is irrelevant. Data are data. The posterior doesn’t care when you looked. Multiplicity adjustments are a frequentist hangover we should have abandoned decades ago.
The frequentist has a different answer. Alpha control is the contract between sponsor and regulator. Without it, you’re just telling stories with probability distributions.
Most working biostatisticians give neither answer.
Instead, they design trials with good frequentist operating characteristics, including controlled Type I error and adequate power, and interpret results using Bayesian inference. They simulate under the null to ensure posterior probability thresholds don’t inflate false positives. They justify priors using historical data and demonstrate robustness to reasonable misspecification.
This is calibrated Bayes.
And if you’ve ever submitted a Bayesian design to FDA, you’ve almost certainly practiced it, even if you never used the term.
The Middle Path
Roderick Little named this framework in 2006, describing it as a roadmap for applied statisticians caught between philosophical camps. The insight is simple: you can demand that your procedures perform well under repeated sampling and interpret any given result using posterior probabilities.
These goals aren’t contradictory. They answer different questions.
Frequentist calibration asks: If I use this decision rule across many trials, how often will I be wrong?
This is the regulator’s question. It’s about error control at the program level.
Bayesian inference asks: Given this data, what should I believe now?
This is the clinician’s question, and increasingly, the patient’s. Direct probability statements about treatment effects are more intuitive than confidence intervals, which is one reason FDA has warmed to them.
Calibrated Bayes says: design for the first question, interpret with the second.
What This Looks Like in Practice
Consider a single-arm oncology trial with a Bayesian stopping rule. You declare efficacy if the posterior probability that the response rate exceeds a threshold is greater than 0.95.
A pure Bayesian might stop there.
A calibrated Bayesian asks something else: under the null hypothesis, how often does this rule produce a false positive? If the answer is 15%, you have a problem, not because the posterior is wrong, but because the operating characteristics are unacceptable.
So you simulate. You adjust the threshold. Maybe 0.95 becomes 0.98. You identify the decision boundary that yields 2.5% Type I error and a coherent Bayesian interpretation.
You haven’t abandoned Bayesian inference. You’ve constrained it to behave well frequentistically.
This is exactly what FDA expects. Recent Bayesian guidance doesn’t say “ignore Type I error.” It says justify your approach, pre-specify your analysis, and demonstrate acceptable operating characteristics.
That’s calibrated Bayes, even if the guidance never uses the phrase.
Why This Matters Now
The Bayesian wars are largely over. Nobody serious argues that posterior probabilities are illegitimate. Nobody serious argues that frequentist error control is irrelevant.
The real question is how to get both.
Calibrated Bayes is the answer most of us have quietly converged on. Naming it explicitly does two things.
First, it clarifies what we’re actually doing in regulatory Bayesian designs. We’re not abandoning frequentist principles. We’re using them to discipline Bayesian decision rules.
Second, it defuses unnecessary conflict. You don’t have to choose a tribe. You can design for operating characteristics your regulator will accept and report the posterior probability your clinician wants to see.
The next time someone asks, “But what about Type I error?” you don’t need to be defensive.
You’re already controlling it.
That’s what the simulations were for.
📬 For more essays on experimental design, regulatory evidence, and statistical decision-making across domains, subscribe to the Evidence in the Wild newsletter or browse the archive.
Member discussion