Nobody Decides to Cause a Disaster

li

Nonetheless, other choices were made.

The Challenger engineers who argued against the launch were right. They had the data. They made the case. They were overruled. Seventy-three seconds after launch, the O-rings failed exactly as they predicted.
This is the part of the story most people know. What most people don’t know is that the engineers were not overruled by idiots.
They were overruled by competent, experienced professionals who were operating inside a system that had, slowly and invisibly, reclassified danger as routine.
That distinction matters. Because if incompetence caused Challenger, the lesson is simple: hire better people. If something else caused it, the lesson is harder.

The warning that stopped being a warning

Years before the explosion, engineers at NASA and contractor Morton Thiokol had observed erosion on the O-ring seals that were critical to the rocket’s integrity.
They flagged it. It was reviewed. And because the shuttle returned safely each time, the anomaly was officially reclassified as within acceptable parameters.

Sociologist Diane Vaughan, who spent years inside NASA’s internal records after the disaster, gave this process a name: the normalization of deviance. A warning sign appears.
Nothing catastrophic follows. The warning sign appears again. Still nothing. Over enough repetitions, the warning stops registering as a warning.
It becomes background noise, an expected feature of the system rather than a signal that the system is degrading.

By January 1986, the language in NASA’s official documents had drifted so far that what engineers once called alarming had become, in writing, acceptable risk.

The night before launch, temperatures at Kennedy Space Center were forecast to drop to 18 degrees Fahrenheit.
Thiokol engineers argued the O-rings had never been tested in anything close to those conditions. NASA managers pushed back.
The burden of proof, one manager told them, was on the engineers to prove the shuttle was unsafe. Not on NASA to prove it was safe.

Thiokol management signed the launch recommendation over the explicit objections of its own engineers.
One manager was told to take off his engineering hat and put on his management hat. He did.

Seventeen years later, NASA did it again

The Columbia disaster in 2003 followed the same logic almost exactly. Foam had been shedding from the shuttle’s external tank on multiple previous flights.
Each time, it was reviewed. Each time, the shuttle returned safely. Each time, the anomaly was reclassified as normal.
When foam struck Columbia’s left wing during launch and breached the thermal protection system, the shuttle disintegrated on reentry. Seven crew members died.
The Columbia Accident Investigation Board was direct about what it found.
NASA had not learned from Challenger because it had not changed the structural and cultural conditions that made normalization possible in the first place. New faces, same system.

This isn’t a NASA problem

The Deepwater Horizon blowout in 2010 killed eleven people and released nearly five million barrels of oil into the Gulf of Mexico.
Investigations found a cement contractor whose own internal tests showed an unstable slurry design, a rig crew that misread a critical pressure test because they were expecting it to succeed, and a blowout preventer with a dead battery.

None of these failures were dramatic. Each was, in isolation, minor. A cost-cutting decision on centralizers. An explanatory theory applied to an anomalous reading. A maintenance item deferred. Their interaction was catastrophic.

At Chernobyl, operators running a safety test on the night of April 25, 1986 were under bureaucratic pressure to complete the procedure before the reactor’s scheduled maintenance shutdown.
The test had already been delayed 36 hours. They were exhausted. The reactor was in an unstable state. They ran the test anyway. The design flaw they didn’t know about did the rest.

The architecture of failure is not built in a single moment. It is assembled, quietly, from reasonable choices made by capable people inside systems that no longer know how to be afraid.

What makes these disasters genuinely disturbing is not that warning signs were ignored. It is that the organizations involved had, over time, built structures that made ignoring warning signs the rational thing to do.
The pressure to maintain a launch schedule, to complete a test before a deadline, to keep a drilling operation on budget — these pressures did not force anyone to make a catastrophic decision. They just made the catastrophic decision feel like a reasonable one.

What changes anything

Organizations that have built genuinely better safety records don’t do it by hiring more careful people. They do it by changing the conditions that shape what careful looks like.
They treat near-misses as information. They build cultures where bad news travels up the hierarchy without being softened or buried.
When a crisis arrives, they move decision-making authority to the person with the most relevant expertise, not the highest rank. They resist the temptation to explain anomalies away.
None of that is complicated to describe. All of it is difficult to sustain when schedules are tight, budgets are shrinking, and the last fifty operations went fine.
That is, of course, exactly when it matters most.

Disasters from Small Decisions is a short nonfiction book examining the three mechanisms behind industrial catastrophe.
Available on Gumroad.

Next
Next

What If You Could Solve the Hardest Problem in Mathematics by Playing the Piano?