Facebook blames outage on error during routine maintenance
Facebook blamed a error during routine maintenance for causing a massive global outage that took down its services for hours
Your support helps us to tell the story
From reproductive rights to climate change to Big Tech, The Independent is on the ground when the story is developing. Whether it's investigating the financials of Elon Musk's pro-Trump PAC or producing our latest documentary, 'The A Word', which shines a light on the American women fighting for reproductive rights, we know how important it is to parse out the facts from the messaging.
At such a critical moment in US history, we need reporters on the ground. Your donation allows us to keep sending journalists to speak to both sides of the story.
The Independent is trusted by Americans across the entire political spectrum. And unlike many other quality news outlets, we choose not to lock Americans out of our reporting and analysis with paywalls. We believe quality journalism should be available to everyone, paid for by those who can afford it.
Your support makes all the difference.The global outage that knocked Facebook and its other platforms offline for hours was caused by an error during routine maintenance, the company said.
Santosh Janardhan, Facebook’s vice president of infrastructure, said in a blog post that Facebook, Instagram and WhatsApp going dark was “caused not by malicious activity, but an error of our own making."
The problem occurred as engineers were carrying out day to day work on Facebook's global backbone network; the computers, routers and software in its data centers around the world along with the fiber-optic cables connecting them.
“During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centers globally,” Janardhan said Tuesday.
Facebook's systems are designed to catch such mistakes but in this case a bug in the audit tool prevented it from properly stopping the command, Janardhan said.
That change also triggered a second problem that made things worse by making it impossible to reach Facebook's servers even though they were operational.
Engineers scrambled to fix the problem on site, but this took time because of the extra layers of security, Janardhan said. The data centers are “hard to get into, and once you’re inside, the hardware and routers are designed to be difficult to modify even when you have physical access to them.”
Once connectivity was restored, services were brought back gradually to avoid traffic surges that could cause more crashes.