Skip to content

Instantly share code, notes, and snippets.

@jbreckmckye
Last active November 25, 2025 02:42
Show Gist options
  • Select an option

  • Save jbreckmckye/32587f2907e473dd06d68b0362fb0048 to your computer and use it in GitHub Desktop.

Select an option

Save jbreckmckye/32587f2907e473dd06d68b0362fb0048 to your computer and use it in GitHub Desktop.
The CloudFlare outage was a good thing

The Cloudflare outage was a good thing

Cloudflare, the CDN provider, suffered a massive outage today. Some of the world's most popular apps and web services were left inaccessible for serveral hours whilst the Cloudflare team scrambled to fix a whole swathe of the internet.

And that might be a good thing.

The proximate cause of the outage was pretty mundane: a bad config file triggered a latent bug in one of Cloudflare's services. The file was too large (details still hazy) and this led to a cascading failure across Cloudflare operations. Probably there is some useful post-morteming about canary releases and staged rollouts.

But the bigger problem, the ultimate cause, behind today's chaos is the creeping centralisation of the internet and a society that is sleepwalking into assuming the net is always on and always working.

It's not just "trivial" stuff like Twitter and League of Legends that were affected, either. A friend of mine remarked caustically about his experience this morning

I couldn't get air for my tyres at two garages because of cloudflare going down. Bloody love the lack of resilience that goes into the design when the machine says "cash only" and there's no cash slot. So flat tires for everyone! Brilliant.

We are living in a society where every part of our lives is increasingly mediated through the internet: work, banking, retail, education, entertainment, dating, family, government ID and credit checks. And the internet is increasingly tied up in fewer and fewer points of failure.

It's ironic because the internet was actually designed for decentralisation, a system that governments could use to coordinate their response in the event of nuclear war. But due to the economics of the internet, the challenges of things like bots and scrapers, more of more web services are holed up in citadels like AWS or behind content distribution networks like Cloudflare.

Outages like today's are a good thing because they're a warning. They can force redundancy and resilience into systems. They can make the pillars of our society - governments, businesses, banks - provide reliable alternatives when things go wrong.

(Ideally ones that are completely offline)

You can draw a parallel to how COVID-19 shook up global supply chains: the logic up until 2020 was that you wanted your system to be as lean and efficient as possible, even if it meant relying totally on international supplies or keeping as little spare inventory as possible. After 2020 businesses realised they needed to diversify and build slack in the system to tolerate shocks.

In the same way that growing one kind of banana, nearly resulted in bananas going extinct, we're drifing towards a society that can't survive without digital infrastructure; and a digital infrastructure that can't operate without two or three key players. One day there's going to be an outage, a bug, or cyberattack from a hostile state, that demonstrates how fragile that system is.

Embrace outages, and build redundancy.

@abesamma
Copy link

abesamma commented Nov 22, 2025

Where I'm from, cash is pretty much going extinct. ATMs are few and getting more difficult to reach. Given these factors and the increasing frequency and severity of these outages, can we really say we're ready for an event that disrupts or cripples the current iteration of our digital infrastructure? I think about that a lot every time I use my phone to pay for stuff. This is definitely a latent disaster waiting to happen.

Addendum: already we're seeing telltale signs of what's to come. People dying unheard because their telecom service failed to connect to emergency services while the csuites get bonuses. There's a severe misalignment of incentives in this whole thing.

@santosh76Kunder
Copy link

Bang On with the problem statement. We have to go ahead with a mindset that things are bound to fail. Only this we need to see, what if any of my vendor fails and if that happens, how do we deliver. That doesn't mean you can plan for everything, but still think about such use cases and have a plan. This should be part of enterprise risk register which is missing today.

@ChrisX101010
Copy link

First problem is, more and more people exist on this planet and you need more infra, more providers, bigger bandwiths, and so on. Second problem is, lots of AI models which drain the sh* ton of resources and botted comments, accounts etc. and it's getting even worse. Most of the internet now is botted than it's organic, it was already an issue but now it's an issue on roids. So, my suggestion is to push back against this internet monopolization, propagate more for self-hosting and also maybe in case of an outage, there should be multiple servers backing it up, like a circuit breaker, in a sense one server dies, other one is activated and reset, what people can do in the meantime is watch static or what was cached last on the server so it doesn't just show the error screen, it also reduces the server load and with so many of these data centers caching information and LLMs you can run locally even offline, I think this could be a viable solution, that is another problem I forgot to mention. Third problem is companies switching from being ISP or from just a Telco company and they are building data centers instead and hosting these to train models. So, if anyone proposes an idea I would love to help and take part in. Currently I wanna pursue to become a Systems Architect, but for starters I wanna be a SRE or some sys admin.. and work more on Open Sourced projects. Thanks.

@fenix1851
Copy link

Such outages are like a flu shot: nobody really knows what happens inside huge systems like cloudflare or aws. One small detail can cause a chain reaction in several major services that the internet relies on. When you add the lower cost of code generation and the fact that more than 50% of internet traffic already comes from botnets, the future of the internet looks unstable. Incidents like the ones with cloudflare or aws need more active preparation.

@bchewy
Copy link

bchewy commented Nov 24, 2025

W gamer

@pabloko
Copy link

pabloko commented Nov 24, 2025

GG noobs every weekend when theres a football match CloudFlare is gone on Spain and nothing happens

@dennisvexnl
Copy link

Resilience is a matter of adopting the right strategy, but that strategy can be complex and/or cost (a lot of) money.
We're addicted to the ease of use of bugtech (see what I did there) by monthly paying a small free. On a larger scale this is done on C-level and everything gets dumped to SaaS platforms but in the end rely on the same pillars.
This trickles down into our (semi)governement and bluelight services as well.
Fact is that if AWS, Azure and CloudFlare woud kick the bucket at the same time, most of western civilization would come to a halt in a matter of hours, but as long as the convenience of these platforms outweigh the downfalls nothing is going to change......sadly
The only thing we can do is to start looking for ourselves how and what we are resilient to these services ourself. Try not using your phone for a day or two or simulate not having access to any digital assets when you would be in an emergency situation.
If we become more resilient ourselves we are better capable to help others to achieve this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment