…I can dream 🙂

 

Unfortunately, Cloudflare suffered an outage yesterday. Here’s the blurb:

Today a configuration error in our backbone network caused an outage for Internet properties and Cloudflare services that lasted 27 minutes. We saw traffic drop by about 50% across our network. Because of the architecture of our backbone this outage didn’t affect the entire Cloudflare network and was localized to certain geographies.

https://blog.cloudflare.com/cloudflare-outage-on-july-17-2020/

 

Cloudflare went into some depth to describe the problem which was nice to see. The following snippet is from the router in question.

from {
    prefix-list 6-SITE-LOCAL;
}
then {
    local-preference 200;
    community add SITE-LOCAL-ROUTE;
    community add ATL01;
    community add NORTH-AMERICA;
    accept;
}

 

As there was backbone congestion in Atlanta, the team had decided to remove some of Atlanta’s backbone traffic. But instead of removing the Atlanta routes from the backbone, a one line change started leaking all BGP routes into the backbone. The correct change would have been to deactivate the term instead of the prefix-list.

 

We’ve all been there, I’m sure!

That moment you commit a BGP change and a sense of dread floods every part of your body as you realise you’ve just broken the internet. Sometimes, it’s nice to have a way to check your work because even the most diligent engineers can make mistakes.

Kompressor is a project I’ve been working on that is designed to help busy network engineers, not replace them 🙂

https://github.com/msbnetcouk/Kompressor