Trouble On the Mountain

Why Facebook, WhatsApp, and Instagram went dark yesterday.

2021-10-05 Previous Home Next

Facebook and its properties WhatsApp, Instagram, and others went offline for about five hours yesterday.

Many small businesses throughout the world depend on these services to reach customers. They got a rude shock.

Why did this outage happen?

There was nothing wrong with Facebook's computers; they were running fine. The problem was their Internet connectivity.

sawing branch

Kingdoms---Autonomous Systems

The Internet has no government; it is made up of hundreds of thousands of sovereign kingdoms. One of these is the Kingdom of Facebook. (Actually, Facebook runs a handful of kingdoms, but they're managed as one single empire.)

Kingdoms like Facebook advertise Internet services like whatsapp.com running on its own computers. To access any Internet services, your computer must be in one of these kingdoms.

If you're using data, then your mobile phone is in your phone company's kingdom. If you're on WiFi, then it's in your Internet Service Provider's kingdom.

In either case, when you type a WhatsApp message, your phone sends data packets addressed to "whatsapp.com" to your phone's own kingdom. Your phone company or ISP knows that the "whatsapp.com" service is maintained by the Facebook kingdom, so it forwards these data packets to Facebook.

In Internet lingo, these kingdoms are called Autonomous Systems, AS.

Signposts on the interchange

Imagine the Internet as a sprawling landscape where the kingdoms (the Autonomous Systems) have their computers.

These kingdoms coordinate packet exchanges between themselves on a massive highway interchange situated on a mountain. The mountain is in the center of all the kingdoms, and it has exits to all the kingdoms.

Each kingdom owns and manages the signposts for its own exit, to show all the services it maintains. For example, Facebook's exit has signposts for "whatsapp.com", "instagram.com", and "facebook.com".

One of the services that Facebook advertises is called a Domain Name System (DNS) server, which can take names like "whatsapp.com" and translate them to numeric addresses for sending data packets to.

That's how you can browse facebook.com.

What happened yesterday

Yesterday, Facebook engineers were updating their signposts with new ones, but they accidentally left out an important destination---their domain name servers.

Data packets could no longer find the signposts to Facebook DNS servers. No one could resolve "facebook.com" or "whatsapp.com", etc. This is what caused the outage.

As soon as they replaced their signposts, the engineers themselves could no longer reach their own computers, because they couldn't resolve their domain names to addresses.

The employees who were on site and could log into the routers themselves, did not have the authorization to make the changes.

It took them some time to resolve this comedy of errors, but now all seems well in the kingdom of Facebook.