Internet was taken down by one small typo at Amazon's web services, company says

Restarting the entire service 'took longer than expected'

Andrew Griffin
Friday 03 March 2017 06:03 EST
Comments
A worker retrieves goods from shelves at Amazon's warehouse on December 5, 2014 in Hemel Hempstead, England
A worker retrieves goods from shelves at Amazon's warehouse on December 5, 2014 in Hemel Hempstead, England (Peter Macdiarmid/Getty Images)

Your support helps us to tell the story

From reproductive rights to climate change to Big Tech, The Independent is on the ground when the story is developing. Whether it's investigating the financials of Elon Musk's pro-Trump PAC or producing our latest documentary, 'The A Word', which shines a light on the American women fighting for reproductive rights, we know how important it is to parse out the facts from the messaging.

At such a critical moment in US history, we need reporters on the ground. Your donation allows us to keep sending journalists to speak to both sides of the story.

The Independent is trusted by Americans across the entire political spectrum. And unlike many other quality news outlets, we choose not to lock Americans out of our reporting and analysis with paywalls. We believe quality journalism should be available to everyone, paid for by those who can afford it.

Your support makes all the difference.

Amazon accidentally took down large swathes of the internet with just one typo.

This week, many of the world's biggest websites and services stopped working because of a problem with Amazon Web Services, the platform that the retailer provides to power people's websites. One of its sites went offline because of the issue – and since thousands of websites rely on it, including many of the world's biggest, they immediately went competely offline or stopped working.

Websites like Quora, Trello and some of the world's biggest news sites went offline or stopped working properly when the issue happened. It even emerged that people's houses broke down – internet-enabled ovens, lights and front gates stopped working as a result of the outage.

Now it has emerged that all of those problems were the result of just one small typo on a set of servers.

The team at Amazon's Simple Storage Service were working to remove a small set of servers from its system, according to a technical report into the incident. As it did so, someone entered the wrong command and removed "a larger set of servers was removed than intended".

One of the two servers that were affected was one that manages the "metadata and location information of all S3 objects" in the Virginia data centre, according to the post. That meant that many of the core processes broke down and the server centre could no longer be used.

To fix that problem, Amazon had to fully restart all of the affected systems. But that was a huge operation – Amazon said it hadn't done one for "many years – and it took even longer than expected, meaning the problem couldn't be fixed quickly.

The company says it has now added a range of fixes to stop the issue happening again. It has modified the tools that deal with such problems to cope more efficiently, and is auditing the rest of the systems, it said.

It also announced that it would make changes to the page that shows whether the service is actually online. Even as Amazon Web Services and the websites that rely on it were falling over, its update page said that everything was fine – because that same page relied on S3 and so was affected by its own outage. It will now rely on different data centres so that it wouldn't buckle if just one went down, it said.

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in