Science Is Not Neutral — and That’s the Point
Clinical research has a recurring failure mode that statistics alone can’t explain.
A trial meets its prespecified criteria. The analysis is technically sound. The uncertainty is quantified. And yet the regulator says no, or asks for more data, or limits the indication, or quietly signals discomfort that never quite resolves.
From the outside, these decisions are often framed as conservatism or inconsistency. From the inside, they are usually something else: a disagreement about which mistakes matter more, and who should bear their consequences.
That disagreement is not a bug in the system. It is the system.
The discomfort we don’t like to name
In clinical research, this discomfort shows up everywhere.
Trials can be statistically correct and still wrong.
Endpoints can be validated and still fail patients.
Confirmatory studies can miss while everyone insists the signal was “real.”
Regulators are often accused of being conservative, slow, or hostile to innovation. But if you look closely, many regulatory decisions aren’t about misunderstanding the statistics. They’re about mistrusting the story being told on top of them.
That mistrust doesn’t come from nowhere.
It comes from knowing that once the data are unblinded, humans are extraordinarily good at convincing themselves that whatever happened was inevitable and meaningful.
The myth of value‑free science
We talk about bias as if it enters science only when someone cheats or cuts corners.
In reality, values enter much earlier.
We choose:
- which questions are worth asking
- which endpoints count as success
- which errors are tolerable
- how much uncertainty patients should bear
None of those decisions are neutral. They are judgments about harm, benefit, urgency, and trust.
Calling a method “objective” does not absolve it of those judgments. It only hides them.
The problem is not that science contains values.
The problem is pretending that it doesn’t.
Why rigor exists at all
If humans were perfectly rational, we wouldn’t need pre-specification.
If belief did not precede data, we wouldn’t worry about p-hacking, outcome switching, or post-hoc narratives.
If incentives aligned cleanly with truth, we wouldn’t need regulators at all.
Rigor exists because humans are fallible.
Pre-specified decision rules are not bureaucratic hurdles. They are moral commitments made in advance, when optimism has not yet been rewarded and disappointment has not yet been rationalized.
This is where design choices quietly become ethical ones.
Bayesian frameworks matter here not because they are philosophically elegant, but because they force sponsors to state what they believe before the evidence arrives, and to live with the consequences when reality disagrees.
Tools that enforce this discipline are not replacing judgment. They are constraining it by making assumptions explicit and refusing to renegotiate them after the fact.
Science as a moral activity
Once you see this, a lot of debates look different.
The question is no longer:
Which method is superior?
It becomes:
What kinds of mistakes are we willing to make, and who pays for them?
Approving an ineffective therapy and rejecting an effective one are both errors. They are not symmetrical. Their consequences fall on different people, at different times, in different ways.
Scientific frameworks do not eliminate these tradeoffs.
They encode them.
And pretending otherwise doesn’t make science more objective. It just makes the values harder to interrogate.
Where values collide
Once you see science this way, the real tension comes into focus.
Patients may value access over certainty.
Regulators may value reliability over speed.
Sponsors may value approval over interpretability.
All three positions are internally coherent. They cannot all prevail at once.
This is the fault line where most modern trial controversies live. Not in the math itself, but in how different actors weight false positives versus false negatives, who is harmed by delay, and who is harmed by error.
Scientific frameworks do not resolve these conflicts.
They encode them.
Understanding where a design sits on that fault line is more informative than knowing which method was used.
Making those tradeoffs visible
Evidence in the Wild exists to surface these tensions before they disappear into technical language.
Not to explain statistics for their own sake, but to explain decisions:
Why a technically valid trial fails regulatory review.
Why a surrogate endpoint collapses under real-world use.
Why incentives distort evidence long before analysis begins.
Why some uncertainties are tolerated, and others are not.
This same philosophy shapes the tools I build.
Design calculators should not recommend actions, persuade sponsors, or launder optimism into a go decision. They should make commitments explicit, show what follows from them, and refuse to negotiate after the fact.
If the assumptions are optimistic, the outputs should be unforgiving.
If a design cannot survive regulatory scrutiny, the math should not pretend otherwise.
Tools should constrain self-deception, not automate it.
The balance that makes science work
Science works best when two forces are held in tension:
- Internal moral seriousness — an honest reckoning with what we owe patients, participants, and the public
- External constraint by reality — evidence that resists our preferences and punishes our overconfidence
Lean too far in either direction and the system breaks.
Pure moral conviction without rigor becomes advocacy.
Pure rigor without moral seriousness becomes procedure.
Good science lives in between.
Not neutral.
Disciplined.
That is the balance worth defending.
Most debates about trial design never reach this fault line.
That’s where the interesting work begins.
📬 For readers interested in the fault lines between trial design, regulatory evidence, and statistical decision-making, additional essays are available via the Evidence in the Wild newsletter or the archive.
Member discussion