Comment

The IT outage proves we can’t simply turn the internet off and on again

Friday’s global computer shutdown – which closed banks, grounded flights and left hospitals unable to carry out operations – has been billed as the most serious the world has ever seen. We should view it as a timely warning, and rethink our relationship with the web, says Chris Blackhurst

Friday 19 July 2024 15:56 EDT
Comments
Passengers at Madrid’s Barajas airport were among those stranded by the global IT outage
Passengers at Madrid’s Barajas airport were among those stranded by the global IT outage (Reuters)

It’s our powerlessness that is so shocking. This morning, millions of people were left staring at a blue computer screen bearing a glum emoji and the message: “Your device ran into a problem and needs to restart. We’re just collecting some error info, and then we’ll restart for you.”

This “blue screen of death” was not selective. Supermarkets, banks, airlines, hospitals – these and many more businesses, organisations and individuals across the globe, rich and poor, were caught up in the chaos. For hours, the world effectively more or less halted as Microsoft customers using Windows 10 suffered an IT outage.

Apparently it was caused by a glitch in an update issued by CrowdStrike – ironically, a cybersecurity firm. It took a while, but they reported that if customers deleted the update and restarted their computer, normality could quickly be restored.

That sounds easy, but it required administrators to turn off and restart each computer. It was a slow, laborious process, in other words.

Still, the planet was back up and running. Already, the recriminations have begun. Business was lost, patients were affected, flights were grounded – the impact list is huge. In anticipation of what is to come, CrowdStrike shares dropped $16bn (£12.4bn) overnight. When the lawyers get properly stuck in, the loss could be much greater.

Amid the anger and shaming, however, one question requires an urgent answer: how can a software defect in a security update have such an effect?

Of course, we’ve been here before – beginning right back in 1997, when a domain name server outage affected 50 million internet users and made headline news. That was followed by damage to submarine cables that caused extensive disruption and slowdown across the Middle East and India. When Michael Jackson died in 2009, many among his legion of fans tried to ascertain the cause of his death at the same time, which provoked a 40-minute meltdown affecting Twitter, Wikipedia and AOL Instant Messenger.

On it goes, the roll call of disaster, some of it accidental, some of it deliberate; witness the domain name system provider Dyn, the subject of multiple cyberattacks in 2016 that shut off internet platforms to users in Europe and North America. Individual countries have seen their entire systems paralysed. On occasion, too, it’s been a government behind the shutdown, actively trying to suppress protests.

Perhaps the worst and the most similar to what has occurred with CrowdStrike came in 2019, when Verizon accidentally rerouted the internet traffic of many of its customers. The corruption sparked an overload as messages were sent to small networks that could not cope, and the material simply vanished.

Now, this Microsoft glitch – which has been described as the most serious IT outage the world has ever seen – is arguably the biggest in terms of its international reach and impact. But the point, surely, is that we cannot keep allowing this to occur. The next, given the constantly improving level of connectivity, could be even larger and even worse.

Doubtless, years of litigation will ensue. CrowdStrike – and possibly Microsoft, too – can look forward to hefty bills and claims, even fines. That’s all well and good. In situations like this, we seek correction and retribution. But it does not get to the heart of the problem. CrowdStrike and Microsoft did not set out on Friday to bring the world to its knees.

Of course, responsibility and liability must lie somewhere. More importantly, though, we need to look at whether it’s right that one company has so much power at its fingertips – and if so, what can be done to mitigate against another failure like the one we have seen.

Too much attention is being paid to the upside – to speed and accessibility, producing ever more impressive performance figures – and not enough to the downside, to the non-exciting aspect, ie what happens if there is suddenly no speed and no accessibility.

The putting in place of preventative measures should not be left to commercial operators: governments must get together and intervene. What occurred on Friday should be viewed as a warning. We need to act, and quickly.

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in