7 min read

Qalsody: The Probability of Harm

What FDA Actually Decided When the Trial Failed
Qalsody: The Probability of Harm
Last updated: January 28, 2026 - Added section on safety profile

Last week I wrote about Calibrated Bayes, the framework where you simulate frequentist operating characteristics under Bayesian assumptions, then present the posterior to decision-makers.

Andrew Nguyen, responding to the Calibrated Bayes post, nailed the intuition: frequentist calibration asks how often you'd be wrong across many trials. A Bayesian version asks how often you'd make a mistake on patients.

That's the probability of harm.

Qalsody (tofersen) is the clearest example I know of FDA making this exact calculation, even if they never called it that.

The trial failed its primary endpoint. p = 0.97. FDA approved it anyway.

From a strict frequentist lens, this looks indefensible. From a calibrated Bayesian lens, it's defensible, and arguably correct. FDA wasn't ignoring the evidence. They were answering a different question.


The Disease

SOD1-ALS is rare and fatal. About 330 people in the US have it. Roughly 120 are diagnosed each year. Median survival varies dramatically by mutation, as short as 1-2 years for the most aggressive variants.

Before tofersen, no treatment targeted the genetic cause. The SOD1 mutation produces a toxic misfolded protein that destroys motor neurons. Patients progressively lose the ability to move, breathe, and eventually live.

Tofersen is an antisense oligonucleotide designed to reduce SOD1 protein production. The causal pathway, from SOD1 suppression to reduced motor neuron toxicity, is well-characterized. The question was whether reducing the protein would actually slow the disease.


The Trial

VALOR randomized 108 patients 2:1 to tofersen or placebo. The primary endpoint was change in the ALS Functional Rating Scale (ALSFRS-R) over 28 weeks.

The result: no significant difference. The treatment effect was 1.2 points. The p-value was 0.97.

By any conventional standard, this trial failed.


What Tofersen Did Show

The drug hit its mechanistic targets:

Biomarker Reduction
CSF SOD1 protein 33%
Plasma neurofilament light (NfL) 51%

NfL matters. It's released when neurons are damaged. Elevated NfL predicts faster ALS progression and earlier death. Reducing it suggests less ongoing neurodegeneration.

The open-label extension showed something else. Patients who started tofersen earlier had lower risk of death (HR 0.27) and permanent ventilation (HR 0.36) compared to those who started later.

These aren't dispositive. Open-label comparisons have obvious limitations. But combined with the biomarker data, a coherent story emerges: the drug is doing what it's supposed to do biologically. The 28-week functional endpoint may just have been too short to capture clinical benefit in a slowly progressing disease.


The Advisory Committee Split

This is where it gets interesting.

The FDA convened its Peripheral and Central Nervous System Drugs Advisory Committee in March 2023. They asked two questions.

Question 1: Is reduction in plasma NfL reasonably likely to predict clinical benefit?

Vote: 9-0 Yes.

Question 2: Does the data provide convincing evidence of effectiveness?

Vote: 3-5-1 No.

The committee unanimously agreed that NfL reduction probably predicts benefit. They split on whether the evidence was "convincing."

Here's the key insight: "convincing" isn't a statistical threshold. It's a judgment about acceptable uncertainty. The five who voted No weren't saying tofersen doesn't work. They were saying the uncertainty was too high for that word.

FDA approved the drug anyway, under accelerated approval. That distinction matters legally, but it doesn't change the decision logic that drove the approval.


The Decision FDA Actually Made

The frequentist question was settled. If you ask "would we see this result by chance if tofersen were ineffective?" the answer is yes, very often. p = 0.97. Trial failed.

But that's not the question FDA answered.

The question FDA answered was: given everything we know, what’s the probability we harm patients by approving versus withholding?

Think about the decision matrix:

Action Harm if wrong
Approve Patients get ineffective drug, side effects, false hope, cost
Withhold Dying patients denied potentially effective therapy

These errors are not symmetric. One is reversible; the other is not. An ineffective drug can be withdrawn; lost time in a fatal disease cannot be recovered.

The 9-0 vote tells us the committee believed P(tofersen works | NfL↓) is meaningfully greater than zero.

FDA's approval tells us: given the stakes, the expected harm of withholding exceeds the expected harm of approving, even under substantial uncertainty.

This is benefit-risk analysis. But it's also calibrated Bayes. The decision threshold wasn't α = 0.05. It was calibrated to the clinical context.


Backing Out the Implicit Threshold

We can roughly reconstruct what FDA accepted.

The committee agreed NfL reduction is "reasonably likely" to predict benefit. Call that θ > 30-50%.

The committee split on "convincing." Call that θ < 70-80%.

So FDA approved with θ somewhere in the 40-60% range—substantial uncertainty, but better than a coin flip.

For a fatal disease with no alternatives, that math works. Even if θ = 40%:

0.40 × (benefit of slowing fatal disease) > 0.60 × (cost of ineffective therapy)

The "cost of ineffective therapy" is bounded. Patients still receive standard ALS care. They don't lose anything except money and the opportunity cost of trying something else (which doesn't exist for SOD1-ALS).

The upside of an effective therapy, slowing a uniformly fatal disease, is qualitatively larger than the downside of being wrong.

FDA didn't lower the bar. They calibrated the threshold to the decision.


Update: What About Safety?

Added January 28, 2026, in response to reader questions

Akanksha Rai raised an important question I didn’t explicitly address: what was Qalsody’s safety profile, and how does that factor into the harm calculation? This matters because the decision framework changes if a drug causes significant, irreversible harm, from weighing uncertain benefit against no alternatives, to weighing certain harm against no alternatives.

What the Safety Data Show

In the 28-week placebo-controlled phase of VALOR, serious adverse events were reported in 18.1% of tofersen-treated patients versus 13.9% on placebo, a modest imbalance consistent with background morbidity in rapidly progressive ALS rather than clear drug-related toxicity.

Treatment discontinuations due to adverse events were 5.6% in the tofersen group versus 0% on placebo, indicating some tolerability issues but not a high overall dropout signal.

Across the clinical program, approximately 7% of tofersen-treated patients experienced serious neurologic adverse events (myelitis, radiculitis, aseptic meningitis, and elevated intracranial pressure or papilledema), which were drug-related. Crucially, these events were reversible and manageable with standard care; none were fatal or permanently disabling; and only a small fraction of patients discontinued treatment because of them.

Most other adverse events (procedural pain, fatigue, arthralgia, headache) were mild to moderate and consistent with either ALS progression or known effects of lumbar puncture.

Taken together, this safety profile, clear biomarker activity with limited, manageable, and reversible drug-specific risks, is part of what made the approval decision defensible despite the missed functional endpoint.

Why This Matters for the Harm Calculation

The safety profile is central to the FDA’s harm-asymmetry argument. The relevant trade-off wasn’t “certain harm versus uncertain benefit.” It was:

  • Approve: expose patients to an approximately 7% risk of reversible, manageable serious neurologic events, plus known procedural side effects, in the presence of strong biomarker evidence suggesting biological activity.
  • Withhold: virtually guarantee progressive, fatal disease in a setting with no targeted alternatives.

For decisions under uncertainty, the character of harms matters. When drug-specific risks are reversible, non-cumulative, and manageable, rather than disabling or life-threatening, their expected cost is far lower than the inevitable progression of a uniformly fatal disease like SOD1-ALS (median survival varies dramatically by mutation, as short as 1-2 years for the most aggressive variants).

This is fundamentally different from scenarios in which a therapy carries a substantial risk of irreversible harm (organ damage, permanent disability, or excess mortality). In those settings, much stronger efficacy evidence is required to justify approval. Here, limited incremental safety risk, in the context of profound unmet need and plausible mechanistic benefit, shifts the harm balance toward regulatory flexibility.

Does This Safety Profile Change the Conclusion?

Not for me. If anything, it strengthens the case that the FDA’s harm calculation was defensible under uncertainty. The biomarker evidence was compelling, the functional endpoint failed, and the drug-specific risks were manageable and reversible. In a fatal disease with no alternatives, that is the kind of trade-off where regulatory flexibility is justified — not guaranteed to be right, but coherent with the decision logic the FDA articulated.


What the Bayesian Guidance Would Have Changed

The January 2026 guidance formalizes exactly this kind of thinking:

"The draft guidance formalizes and contextualizes the use of informative priors to borrow external evidence... as a strategy to improve efficiency in settings where fully powered randomized trials are infeasible."

Qalsody fits perfectly. Ultra-rare disease. Mechanistic clarity. Validated biomarker. Clinical endpoint too slow to capture benefit in a feasible trial.

The guidance emphasizes:

  • Explicit success criteria. What posterior probability is "enough"?
  • Operating characteristics under the prior. What's the probability of approving an ineffective drug? Of rejecting an effective one?
  • Pre-specification and transparency. Lay out the decision framework before unblinding.

If Qalsody had been designed under this framework, the approval wouldn't have been different. But it would have been explicit. The 40-60% confidence level would have been stated upfront, justified by the clinical stakes, and agreed upon with FDA before the trial read out.

That's the value of naming the framework. Calibrated Bayes gives sponsors and regulators a shared vocabulary for decisions they're already making.


Implications

Qalsody teaches a few things.

Surrogate endpoints are probability statements. "Reasonably likely to predict benefit" is a prior on the causal link between biomarker and outcome. Treat it like one.

Decision thresholds should be calibrated to stakes. 95% confidence makes sense for a common disease with alternatives. It may be wrong for a fatal orphan disease with nothing else.

The Bayesian guidance operationalizes what FDA already does. Qalsody shows the framework in action, before the guidance existed. The guidance just names it.

Naming the framework matters. Without a shared vocabulary, these decisions look arbitrary. With it, they look principled.


Building This Into Your Design

Qalsody wasn't an exception to statistical rigor. It was an example of rigor applied to the right question.

If you're designing a rare disease trial, the question isn't just "what's my Type I error rate?" It's "what's my probability of making the wrong decision for patients?"

Those aren't the same question. Calibrated Bayes lets you answer both.


This post is a companion to Calibrated Bayes: The Framework You're Already Using and Inside the First FDA-Approved Bayesian Analysis.

📬 For more essays on experimental design, regulatory evidence, and statistical decision-making across domains, subscribe to the Evidence in the Wild newsletter or browse the archive.