4 min read

When the p-value Was Enough: Rethinking Trial Design Through ISIS-2

When the p-value Was Enough: Rethinking Trial Design Through ISIS-2

A simple trial. A massive effect. A p-value so small it barely needed confidence intervals.

In 1988, ISIS‑2 didn’t just change cardiology—it proved that 162 mg of aspirin could prevent one in every forty deaths after acute myocardial infarction. More importantly, it reminded statisticians what clinical trial design is supposed to do: answer questions that matter, clearly and decisively.


The Chaos Before the Trial

In the early 1980s, cardiology stood on the edge of transformation. Acute myocardial infarction (MI) killed more than half a million Americans each year, yet the use of aspirin—cheap, ubiquitous, biologically plausible—remained controversial. Some clinicians gave it routinely. Others avoided it. Observational studies existed, but randomized evidence did not.

The uncertainty wasn’t academic. Every month of debate translated into thousands of preventable deaths.

Into this chaos stepped the Second International Study of Infarct Survival—ISIS‑2. Its question was disarmingly direct: could a large, pragmatic, well‑powered randomized trial resolve the debate once and for all, and in doing so, change emergency cardiovascular care overnight?


Trial Design in Its Purest Form

ISIS‑2 tested two interventions: aspirin and streptokinase. Using a 2×2 factorial design, more than 17,000 patients across 16 countries were randomized into four groups.

Why a factorial design? It allowed investigators to test both treatments simultaneously, conserving sample size while maintaining roughly 90% power to detect a 15% reduction in mortality. It also enabled an interaction test (p = 0.08, suggesting additive effects) without paying the usual four‑fold sample size penalty. In an era before adaptive designs, this was efficiency at its finest.

The rest of the design was equally spare. I’ve reviewed modern protocols with more stratification factors than ISIS‑2 had study sites. The endpoint was hard—vascular mortality at five weeks. There were no surrogate markers, no biomarker stratification, no forest‑plot archaeology. Randomization occurred within 24 hours of symptom onset (median: five hours), leaving little room for selection bias or interpretive fog.

Just clarity.

Then came the results.


0.00001

The numbers were unmistakable.

Aspirin reduced five‑week vascular mortality from 11.8% to 9.4%—a 23% relative reduction (95% CI: 15–30%).

Streptokinase reduced mortality from 12.0% to 9.2%—a 25% reduction (CI: 18–32%).

Together, they reduced mortality from 13.2% to 8.0%—a 42% reduction (CI: 34–50%).

The number needed to treat for aspirin alone was 40. In emergency cardiology, that is not incremental. It is transformative.

And then there was the p‑value: 0.00001. Not borderline. Not “trending.” Decisive.

One steering committee member reportedly remarked that they had never seen a trial so convincing that even the skeptics fell silent.

What ISIS‑2 didn’t require is just as instructive. No multiplicity corrections were needed to defend the conclusion. No Bayesian priors to steady belief. No sensitivity analyses to explain away doubt. The effect was large, consistent across countries and subgroups, and clinically undeniable.

Well—almost.


The Astrological Interlude

ISIS‑2 is also remembered for one of the most famous jokes ever published in a medical journal. When investigators examined outcomes by astrological sign, aspirin appeared harmful for patients born under Gemini or Libra (p = 0.015).

They didn’t hide this result. They published it.

The lesson was blunt: even with an overall p‑value of 0.00001, if you interrogate the data aggressively enough, you will always find something “significant.” It remains one of the clearest demonstrations of why subgroup analyses demand humility.


What Made ISIS‑2 Work

Four design principles powered ISIS‑2’s success:

Size. Seventeen thousand patients provided real power, not theoretical power based on optimistic assumptions.

Speed. Rapid enrollment and immediate randomization minimized bias and maximized relevance.

Simplicity. No over‑stratification, no adjustment gymnastics, no alpha‑spending hierarchies.

Significance. Vascular mortality—an endpoint that mattered to patients, clinicians, and health systems alike.

The entire trial cost roughly $3 million. Within months, it changed global practice. Aspirin was approved for acute MI within a year. That is return on investment.


The Contrast with Today

Now compare this with modern oncology trials—even the successful ones. Studies like KEYNOTE‑189 or IMpower150 transformed lung cancer treatment, but with hazard ratios in the range of 0.5 to 0.8 and p‑values hovering near 0.02. Clinically important? Absolutely. Decisive in the ISIS‑2 sense? Less so.

Today’s typical Phase III trial often includes:

• 300–500 patients
• Multiple stratification factors
• Several overlapping primary or co‑primary endpoints
• Biomarker‑defined subgroups with hierarchical testing
• Alpha‑splitting, gatekeeping, and fallback procedures

After all that machinery, the effect size still often requires sensitivity analyses and dense forest plots to interpret.

This is not an argument against precision medicine. It is an argument against reflexive complexity. In our fear of being wrong, we sometimes design trials that struggle to be unequivocally right.


When Simplicity Still Makes Sense

ISIS‑2 is not a template for every trial. But there are settings where its philosophy still applies:

• Broad public‑health interventions with population‑wide impact
• First‑line therapies with strong biological rationale for large effects
• Resource‑constrained environments
• Emergency medicine, where speed outweighs patient selection

The insight is simple: match design complexity to the complexity of the question, not to the sophistication of available tools. Measuring 500 biomarkers does not obligate you to stratify by them.


What Would ISIS‑2 Look Like Today?

It is not hard to imagine how ISIS‑2 would be received by a modern protocol review committee. Too few covariates. No enrichment strategy. No machine‑learning risk score. I’ve heard versions of these objections in real reviews. And yet seventeen thousand patients, one primary question, and eighteen months were enough to change medicine.


Design Principles for the Bold

ISIS‑2 offers a quiet decision framework for trial designers today:

• Can you state your primary question in one sentence? If not, simplify.
• Is the expected effect size large enough to matter clinically? Power for that.
• Are you stratifying because it is necessary, or because it is possible?
• Would a practicing clinician understand your endpoint without a glossary?

Sometimes the bravest design choice is restraint. In an era of adaptive platforms and basket trials—powerful tools when used well—ISIS‑2 reminds us that sophistication is not the same as clarity.

Not every trial can be ISIS‑2. But every trial should know why it isn’t.


Designing a trial where simplicity could be powerful? Let's discuss how ISIS-2's principles might apply to your next study. And if you want more stories about trials that changed medicine—and the statistical thinking behind them—subscribe to Evidence in the Wild.