4 min read

When Every Company Becomes a Trialist

When Every Company Becomes a Trialist
In 2014, during a major snowstorm, Uber ran a randomized test by turning off surge pricing for a subset of riders. Some waited longer. Others never got a ride at all. It wasn’t just a pricing tweak. It was a trial of behavior, supply, and equity. Trials, once reserved for drug approvals, are now everywhere—and they carry real stakes.

The Great Migration

In the early 2000s, A/B testing was a tool for web marketers: Does a red button get more clicks than a blue one? But soon, companies like Google, Facebook, and Amazon started hiring statisticians—not just data scientists—to run increasingly complex experiments. These weren’t just UX tests. They were full-scale randomized trials, designed to evaluate causal impact with the same rigor once found only in pharma.

Netflix's evolution tells the story. In the early 2010s, they ran simple A/B tests: thumbnail A versus thumbnail B, which gets more clicks? But within a few years, they'd moved to contextual bandits—algorithms that learned which thumbnail to show which user based on accumulating data. Then multi-armed bandits, continuously optimizing across dozens of thumbnail variants simultaneously. Suddenly, they weren't just running tests. They were running adaptive, sequential experiments with Bayesian updating and exploration-exploitation trade-offs. That's when they started hiring people with 'randomized trial' on their CVs, not just 'machine learning.' They needed statisticians who understood how to learn from experiments while the experiments were still running.

It’s not just tech anymore.


Beyond Silicon Valley

Randomized trials have quietly taken hold in:

Charter school lotteries weren't designed as experiments—they were designed for fairness. But economists like Roland Fryer Jr. realized something remarkable: random admission created the perfect natural experiment. When oversubscribed schools held lotteries, they accidentally randomized access to treatment (the charter school) versus control (traditional public schools). The Perry Preschool Project took this further, deliberately randomizing 3- and 4-year-olds into high-quality preschool or standard care in the 1960s, then following them for decades. By the time those children reached middle age, researchers had measured everything from earnings to arrest rates—outcomes no pharmaceutical trial would ever attempt.

International development followed suit. GiveDirectly and organizations like Innovations for Poverty Action started treating cash transfers and job training programs like investigational therapies: randomize, measure, iterate. These weren't controlled hospital settings with carefully selected patients. These were villages in Kenya, job centers in Chicago, schools in India—messier, harder to standardize, and absolutely requiring rigorous experimental design.


What Changes—and What Doesn’t

Randomization still matters. Intention-to-treat holds. But everything else? A different game entirely.

Pharma trials measure time-to-event with Kaplan-Meier curves and Cox proportional hazards models. Tech trials measure conversion rates with binomial regression and Bayesian updating. Both are rigorous. Neither is more real. But when pharma stops a trial early, it's because people are dying. When Netflix stops a test early, it's because people are churning. The statistical mechanics are similar—sequential testing, alpha spending, stopping boundaries—but the ethical weight is entirely different.

Then there's the surrogate endpoint problem. In oncology, we debate whether progression-free survival is an acceptable proxy for overall survival. In tech, the debate is whether time-on-site measures satisfaction or addiction. Both require the same question: Are we measuring what matters? But pharma has the FDA forcing that conversation. In tech, there's no FDA—just shipping deadlines.


The Statistical Tension

Here's where it gets statistically interesting: when you run one trial, you control your type I error at 0.05. When you run 1,000 trials simultaneously—which is a quiet Tuesday at Facebook—you're practically guaranteed false positives. Tech dealt with this through Benjamini-Hochberg corrections and false discovery rate control. Pharma dealt with this through not running 1,000 trials simultaneously.

But now the two worlds are colliding. Platform trials in oncology are looking more like tech's multi-armed bandit problems. Adaptive designs are borrowing from tech's continuous monitoring. And suddenly, the statistician's toolkit needs to accommodate both 'we can't afford a single false positive because people might die' and 'we need to learn fast because the product cycle is 6 weeks.'

Where This Gets Interesting

The real innovation isn't that trials left pharma. It's what happened when the methods started cross-pollinating.

Consider platform trials—multi-arm, multi-stage designs developed for rare diseases that exploded during COVID-19. The RECOVERY trial in the UK tested multiple treatments simultaneously, adding and dropping arms based on accumulating evidence. It looked remarkably like tech's continuous experimentation approach, but with mortality as the endpoint. They found that dexamethasone reduced deaths, but only after testing it alongside hydroxychloroquine, azithromycin, convalescent plasma, and tocilizumab—some of which failed, some of which worked, all sharing the same control group.

This is exactly how Spotify tests features. Except people don't die if we get the playlist algorithm wrong.


Trials Without Walls

That Uber surge pricing experiment? It settled an internal debate about pricing elasticity and consumer behavior in six weeks—a question that would have taken a pharmaceutical company six years and a hundred-page statistical analysis plan.

But here's what Uber didn't have to answer: What if surge pricing disproportionately harms low-income users? What if the algorithm creates disparate impact? What if the experiment itself causes harm?

Pharma's regulatory infrastructure forces those questions. Tech's doesn't. Education and policy sit in the ethical gray zone—lacking pharma's regulatory oversight but facing stakes higher than click-through rates.

Trials have escaped pharma. The question now is whether our ethics, our methods, and our statistical standards can keep pace with where they're going.


📬 Want more insights on experimental design across domains? Subscribe to the newsletter or explore the full archive of Evidence in the Wild.