Analysis

The CrowdStrike crisis shows the internet is more fragile than we think

The simple and smooth experiences that computers give us are the result of a vast and complex systems, writes Andrew Griffin. We should get used to the idea that sometimes they go wrong

Sunday 28 July 2024 01:00 EDT
Comments
(Reuters)

Days before the world’s IT infrastructure fell apart, one of the fathers of the internet had been forecasting exactly that. Vint Cerf – speaking in London to celebrate his creation’s 50th birthday – had been warning that our infrastructure was more fragile than we realised.

“I think we need to improve people's intuition” about how complex and fragile the world’s online systems are, Cerf told The Independent shortly before he was proven right. “And we also need to improve their understanding what it means to use these technologies safely, we need some more critical thinking and willingness to think critically.”

Cerf, a vice president and the Chief Internet Evangelist at Google, emphasised his point with reference to AI systems such as ChatGPT, which present questionable information with total certainty, and trick people into thinking they are more dependable than they are. These things are not “ethereal” magic experiences but real, contested systems that don’t always work, he warned.

And then it happened. Computers across the world wouldn’t turn on, and the things that relied them shut off. TV stations went down, hospitals were hit by problems and many of the world’s biggest airlines grounded their flights. The world didn’t exactly stop, but it did slow down substantially.

A week later still not everything is recovered, and it might take some time yet for the world to be fully back up to speed. And experts warn that it might happen again, at any time – though the CrowdStrike software at the heart of the problem is now fixed, the dangers that it revealed are still present.

In the immediate aftermath, many people made the point that they often do in the wake of this kind of outage: how can one company, often largely unknown to most people, cause so many problems? Does it not show that we are too reliant on a small set of companies to keep our infrastructure going?

The same argument has been made repeatedly, such as when a problem at Cloudflare knocked websites offline in summer 2022. But it’s not clear exactly what it is asking for: we do indeed rely on a relatively small number of companies to power our internet infrastructure, but that’s partly because of economies of scale, and besides the interconnected nature of these technologies mean that even smaller companies can affect the whole system in substantial ways.

The argument that we rely on too few companies to power the internet is maybe more compelling in the case of social media, and it has also been made in the wake of large outages at companies such as Facebook. But again those social networks are compelling precisely because of their scale and the network effect it provides. Once again, it’s not clear what we’re asking for: people spread out across a variety of social networks, or some backup ones that we can retreat to in times of need?

If we do need to spread ourselves out and have more backup plans, then it’s not necessarily the tech companies that need to do it, anyway. “What we’re seeing now is a clear divide between organisations that have contingency budgets and plans for critical incidents, and those which lack those resources in some way,” said Andrew Peck, a cyber resilience PhD researcher at Loughborough University.

“Budget airlines have been hit harder than national carriers because their heavily pruned operating models lack flexibility for rapid adaptations needed for resilient operations, whereas national carriers have more human resources to draw on. For example, budget airline customer service teams with less human resources have struggled to answer the phone, and when people have contacted their online teams, they aren’t getting the answers they are looking for as these teams cannot get through to airports or ground staff. There simply isn’t enough staff.”

Many sensible people worry a lot about the fact that the once diverse internet is becoming increasingly consolidated in the hands of companies that don’t always have our best interests at heart, but there is no obvious way that having a more independent internet would also mean having a more robust one. CrowdStrike is also a relatively minor player in the cyber security space, and the problem technology is on around 1 per cent of computers.

This time around it also felt like we were lingering on the wrong bit of the question. It’s good to ask how one unknown company could cause this much chaos – but we should be focusing at least as much on the fact that CrowdStrike and others are unknown as we do on the fact that there are relatively few of them. Until last week, CrowdStrike were mostly known for sponsoring Formula One, with their name appearing on the “halo” that protects Lewis Hamilton in the event of a crash in his Mercedes.

From now on, it will be synonymous with one of the biggest outages in IT history. “Consider this historic IT outage a warning for the fragility of third-party dependencies,” said Mehdi Daoudi, the founder and CEO of Catchpoint Systems. “We've seen major disruptions — surgeries postponed, flights grounded, 911 services inaccessible — because of such dependencies. A healthcare data scientist told me that he'd never heard of CrowdStrike before Friday. But now, this will be the one thing he remembers them by as multiple surgeons and patients alike couldn't confirm their appointments and his company was caught in the fray.” Unless you paid close attention to Formula 1 cars or IT companies, CrowdStrike was obscure – and then, all of a sudden, it wasn’t.

There are in fact hundreds of companies like this. Having fun on the internet is a little like having a hot bath – it’s relaxing and enjoyable and when you are indulging yourself a little at the end of the week you don’t think about the vast array of grand systems that are required to make it happen. But the water must be collected and carried through complex piping systems to make it to your boiler where it is heated.

You shouldn’t have to think about that when you have a bath – and you probably don’t, until the hot water stops. At that point you will reasonably run around trying to find the cause of the failure, opening up the boiler and calling up the water company. It’s only when everything stops working that we become aware of how much it works.

And this is much the same with the technology that surrounds us. Every time a computer switches on, it awakens a similarly byzantine system and – just like your plumbing – one weak link is enough to ruin everything. On Friday, that weak link broke, and so did much of the world.

If there’s any silver lining in outages like the one on Friday, then one might be that it has helped to roughen up the internet, to make its processes a little easier to see, and harder to ignore. The web gives us simple and smooth experiences, which can lure us into believing that the process of doing so is the same; in truth, it’s a rough and complex infrastructure that requires constant upkeep and concern.

That was part of what Cerf had been warning about a few days before his warnings became at least partly true. He had said that people need to have a better “intuition” about how to use technology companies safely, and have more agency in protecting themselves. That means having a broader awareness of how those technologies actually work.

That doesn’t mean letting those behind such outages off the hook. Marvelling at the complexity of the system shouldn’t mean letting off those required to keep it up.

When your plumber’s cowboy job floods your house, you don’t shake his hand and congratulate him for being just one part of a noble lineage that stretches back to Joseph Bazalgette and beyond. But it will serve as a good reminder of the value of a reliable tradesman; and you might, after the leak is fixed and the floor is dried, enjoy that bath just a little bit more.

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in