When the p-value Was Enough: Rethinking Trial Design Through ISIS-2

A simple trial. A massive effect. A p-value so small it didn't need confidence intervals. In 1988, ISIS-2 didn't just change cardiology—it proved that 162mg of aspirin could prevent 1 in 40 deaths. And it reminded statisticians what trial design is supposed to do: answer questions that matter.

Act I: The Chaos Before the Trial

In the 1980s, cardiology was teetering on the edge of transformation. Acute myocardial infarction (MI) killed 540,000 Americans annually—yet the use of basic interventions like aspirin remained contentious. Some clinicians gave it. Others didn't. There were observational studies, yes—but randomized data? Nothing definitive.

The uncertainty wasn't academic. Every day of debate meant thousands of preventable deaths.

Into this chaos walked the Second International Study of Infarct Survival, better known as ISIS-2. It asked a bold question: Could a pragmatic, large-scale, well-powered trial resolve the debate—and in doing so, transform emergency cardiovascular care overnight?

Act II: Trial Design in Its Purest Form

ISIS-2 tested two interventions: aspirin and streptokinase. Using a 2x2 factorial design, more than 17,000 patients across 16 countries were randomized into four groups:

Aspirin No Aspirin
Streptokinase Group 1 Group 2
No Streptokinase Group 3 Group 4

Why a 2x2 factorial design? It allowed the investigators to test both interventions simultaneously—economizing on sample size while maintaining 90% power to detect a 15% mortality reduction. The design also enabled interaction testing (p=0.08 for interaction, suggesting additive effects) without the typical 4x sample size penalty. In an era before adaptive designs, this was efficiency at its finest.

The trial was clean. The endpoints were hard: vascular mortality at 5 weeks. There were no surrogate endpoints, no subgroup slicing, no biomarker stratification. Randomization occurred within 24 hours of symptom onset (median: 5 hours). No time for selection bias. No room for interpretive fog.

Just the kind of simplicity that makes statisticians smile and regulators nod.

Then came the results.

Act III: 0.00001

The numbers were staggering:

Aspirin reduced 5-week vascular mortality from 11.8% to 9.4% (23% relative reduction, 95% CI: 15-30%).
Streptokinase: 12.0% to 9.2% (25% reduction, CI: 18-32%).
Combined: 13.2% to 8.0% (42% reduction, CI: 34-50%).

The NNT? Just 40 for aspirin alone. In emergency cardiology terms, that's transformative.

But what many remember most is the p-value: 0.00001. Not 0.049. Not borderline. Not "trending toward significance." This was decisive in a way that silenced skeptics.

"I've never seen a trial where the data was so convincing that even the skeptics fell quiet," one steering committee member reportedly said.

And here's what ISIS-2 didn't need: multiplicity adjustments, Bayesian priors, or sensitivity analyses to convince anyone. The effect was so large, so consistent across countries and subgroups, that the truth was unmistakable.

Well, except for one subgroup...

Act IV: The Astrological Interlude

In what became statistics' most famous teaching moment, ISIS-2's investigators showed that aspirin appeared harmful for patients born under Gemini or Libra (p=0.015).

This wasn't buried in supplementary materials—they put it in The Lancet publication. The message? Even with 0.00001 overall, if you torture the data enough, it will confess to anything. It's a reminder that remains painfully relevant in our era of subgroup mining.

Act V: What Made ISIS-2 Work

Four design principles powered ISIS-2's success:

  • Size: 17,000 patients gave real power—not theoretical power calculations assuming unrealistic effect sizes
  • Speed: Enrollment and randomization were immediate, before selection could creep in
  • Simplicity: No over-stratification, no adjustment gymnastics, no alpha-spending hierarchies
  • Significance: All-cause vascular mortality—an endpoint that mattered to patients, families, and payers alike

The trial cost approximately $3 million. Within months, it changed global practice. The FDA approved aspirin for acute MI within the year. That's return on investment.

Act VI: The Contrast with Today

Look at modern oncology trials—even the breakthroughs. KEYNOTE-189 or IMpower150 transformed lung cancer treatment, but with hazard ratios of 0.49-0.78 and p-values hovering around 0.02. Important? Absolutely. Decisive in the ISIS-2 sense? That's harder to claim.

Today's average Phase III trial features:

  • N = 300-500
  • ≥3 stratification factors
  • Multiple overlapping primary endpoints
  • Biomarker-defined subgroups with pre-specified hierarchical testing
  • Alpha-splitting, gatekeeping procedures, and fallback strategies

And after all that complexity? Often a treatment effect that requires careful interpretation, sensitivity analyses, and forest plots to understand.

This isn't purely a critique of complexity—precision medicine has its place. But we've become so afraid of being wrong that we sometimes design trials too cautious to be definitively right.

Act VII: When Simplicity Still Makes Sense

ISIS-2's approach isn't universally applicable. But consider where elegant simplicity still wins:

  • Broad public health interventions where population-wide benefit is expected
  • First-line treatments with strong biological rationale for large effects
  • Resource-constrained settings where complexity isn't feasible
  • Emergency medicine where time to treatment matters more than patient selection

The key insight: Match your design complexity to your question's complexity, not to the available technology. Just because we can measure 500 biomarkers doesn't mean we should stratify by them.

Act VIII: What Would ISIS-2 Look Like Today?

Imagine proposing ISIS-2 to a 2024 protocol review committee:

"No genomic stratification?"
"You're not adjusting for 47 baseline covariates?"
"Where's your adaptive enrichment strategy?"
"What about the machine learning risk score?"

It might not survive peer review. But perhaps it should. Because sometimes, the clearest answer is the most powerful one. And in a world where the median Phase III trial takes 7 years from protocol to publication, maybe there's wisdom in remembering what 17,000 patients and one question accomplished in 18 months.

Final Act: Design Principles for the Bold

For statisticians and trial designers today, ISIS-2 offers a decision framework:

  1. Can you articulate your primary question in one sentence? If not, simplify.
  2. Is your expected effect size large enough to matter clinically? Power for that, not just for statistical significance.
  3. Are you stratifying because it's necessary, or because you can? Every stratification factor is a complexity tax.
  4. Would a practicing clinician understand your endpoint without a glossary? If not, you might be measuring the wrong thing.

Sometimes the bravest design choice is restraint. In an era of adaptive platforms and basket trials—all valuable tools—ISIS-2 reminds us that occasionally, the question isn't "How sophisticated can we make this?" but rather "How simple can we keep this while still getting the answer?"

As we chase smaller effect sizes in narrower populations with more precise tools, it's worth remembering what 0.00001 felt like. Not every trial can be ISIS-2. But every trial should know why it isn't.


Designing a trial where simplicity could be powerful? Let's discuss how ISIS-2's principles might apply to your next study. And if you want more stories about trials that changed medicine—and the statistical thinking behind them—subscribe to Evidence in the Wild.

Subscribe to Evidence in the Wild

Sign up now to get access to the library of members-only issues.
Jamie Larson
Subscribe