26 Nov 2025 4 min read Randomized Trials

When Every Company Becomes a Trialist

Listen to this essay AI Narrated

In 2014, during a major snowstorm, Uber ran a randomized test by turning off surge pricing for a subset of riders. Some waited longer. Others never got a ride at all. It wasn’t just a pricing tweak. It was a trial of behavior, supply, and equity.

Trials, once reserved for drug approvals, are now everywhere, and they carry real stakes.

The Great Migration

In the early 2000s, A/B testing was mostly a tool for web marketers: Does a red button get more clicks than a blue one? But it didn’t stay that way for long. Companies like Google, Facebook, and Amazon began hiring statisticians, not just data scientists, to design experiments with real causal bite.

These weren’t just UX tests. They were full-scale randomized trials, built with a level of rigor that once lived almost exclusively in pharmaceutical research.

Netflix’s evolution captures the shift. In the early 2010s, the experiments were simple: thumbnail A versus thumbnail B. Which one gets more clicks? Within a few years, that gave way to contextual bandits—algorithms that learned which thumbnail to show which user as data accumulated. Then came multi‑armed bandits, continuously reallocating traffic across dozens of variants.

At that point, Netflix wasn’t just running tests. It was running adaptive, sequential experiments with Bayesian updating and explicit exploration–exploitation trade‑offs. That’s when résumés with “randomized trials” started to matter more than ones that only said “machine learning.” Learning while the experiment was still running had become the core skill.

And it wasn’t just tech.

Beyond Silicon Valley

Randomized trials have quietly taken hold far outside product teams.

Charter school lotteries weren’t designed as experiments. They were designed for fairness. But economists quickly realized that random admission created a near‑perfect natural experiment: access to a charter school versus a traditional public school, randomized by oversubscription.

The Perry Preschool Project took this logic further. In the 1960s, researchers deliberately randomized 3‑ and 4‑year‑olds into a high‑quality preschool program or standard care, then followed them for decades. By middle age, outcomes ranged from earnings to incarceration rates—endpoints no pharmaceutical trial would ever attempt.

International development followed the same path. Organizations like GiveDirectly and Innovations for Poverty Action began treating cash transfers and job‑training programs like investigational therapies: randomize, measure, iterate. These weren’t controlled hospital environments. They were villages, job centers, and classrooms—messy settings where careful experimental design mattered even more, not less.

What Changes—and What Doesn’t

Randomization still matters. Intention‑to‑treat still holds. But almost everything else operates under a different set of constraints.

Pharmaceutical trials measure time‑to‑event using Kaplan–Meier curves and Cox models. Tech experiments often focus on conversion rates, retention, or engagement, analyzed with binomial models or Bayesian posteriors. Both approaches are rigorous. Neither is more “real.”

The difference is stakes.

When a pharma trial stops early, it’s often because people are dying. When a Netflix experiment stops early, it’s because users are churning. The statistical mechanics, including sequential monitoring, stopping rules, and error control, look remarkably similar. The ethical weight does not.

Then there’s the surrogate endpoint problem. In oncology, we debate whether progression‑free survival is a meaningful stand‑in for overall survival. In tech, the argument is whether time‑on‑site measures satisfaction or addiction. In both cases, the question is the same: Are we measuring what actually matters?

Pharma has regulators forcing that conversation. Tech mostly has shipping deadlines.

The Statistical Tension

Having spent years in oncology trials before seeing how tech runs experiments, the contrast is striking. Same math, different stakes.

Here’s the uncomfortable reality: when you run one trial, you can control your type I error at 0.05. When you run 1,000 trials at once, which is a quiet Tuesday at a large tech company—false positives are inevitable.

Tech adapted by embracing false discovery rate control, Benjamini–Hochberg corrections, and continuous monitoring. Pharma avoided the problem by simply not running 1,000 experiments at the same time.

That separation is breaking down.

Where This Gets Interesting

The real innovation isn’t that trials escaped pharma. It’s what happened when the methods started cross‑pollinating.

Platform trials in oncology, multi‑arm, multi‑stage designs, began as a solution for rare diseases and exploded during COVID‑19. The RECOVERY trial in the UK tested multiple treatments simultaneously, adding and dropping arms as evidence accumulated. A shared control group, continuous learning, rapid iteration.

They tested dexamethasone alongside hydroxychloroquine, azithromycin, convalescent plasma, and tocilizumab. Some failed. One saved lives. All were evaluated under a single adaptive framework.

Structurally, it looks a lot like how Spotify tests product features. The difference is that people don’t die if the playlist algorithm is wrong.

Trials Without Walls

That Uber surge‑pricing experiment settled an internal debate about elasticity and consumer behavior in six weeks—a question that might have taken a pharmaceutical company six years and a hundred‑page statistical analysis plan.

But Uber didn’t have to answer harder questions: What if surge pricing disproportionately harms low‑income users? What if the algorithm creates disparate impact? What if the experiment itself causes harm?

Pharma’s regulatory infrastructure forces those questions. Tech’s doesn’t. Education and public policy sit in the gray zone, without pharma’s oversight, but with stakes far higher than click‑through rates.

Trials have escaped the clinic. The open question is whether our ethics, our methods, and our statistical standards can keep pace with where they’re going.

📬 Want more insights on experimental design across domains? Subscribe to the newsletter or explore the full archive of Evidence in the Wild.

The Great Migration

Beyond Silicon Valley

What Changes—and What Doesn’t

The Statistical Tension

Where This Gets Interesting

Trials Without Walls

You might also like...

The COVID-19 Vaccine Trial That Put Bayesian Sequential Design on the Map

Inside the First FDA-Approved Bayesian Analysis

The FDA's Bayesian Guidance: Learning in Theory, Pre-Specification in Practice

In Defense of 50:50 Randomization

Stop the Zombie Trial: The "Kill Switch" for Failed Experiments

Every essay, delivered