On Monday, Mark Zuckerberg’s empire was taken down by a Facebook, Instagram, WhatsApp, and Oculus outage. It’s a social media blackout that can only be defined as “thorough,” and it looks like it’ll be extremely difficult to reverse.
Facebook has not acknowledged the source of its problems, although there are plenty of hints on the internet. According to the company’s Domain Name System data, the company’s family of applications effectively vanished from the internet around 11:40 a.m. ET. DNS is known as the internet’s phone book because it converts the host names you put into a URL tab, such as facebook.com, into IP addresses, which are the locations of those sites.
DNS errors are rather prevalent, and when in doubt, they’re the cause of a site’s downtime. They can occur for a variety of strange technical reasons, most of which are linked to setup errors, and are usually simple to fix. However, something more serious looks to be afoot in this situation.
According to Troy Mursch, chief research officer of cyber threat intelligence firm Bad Packets, “Facebook’s outage appears to be caused by DNS; however, that’s just a symptom of the problem.” The main problem, according to Mursch and other experts, is that Facebook has removed the so-called Border Gateway Protocol route, which comprises the IP addresses of its DNS name servers. If DNS is the phone book for the internet, BGP is the navigation system; it determines the path data takes as it travels the information superhighway.
“You can think of it like a telephone game,” says Angelique Medina, director of product marketing at network monitoring provider Cisco Thousand Eyes. Instead of individuals participating, smaller networks are letting each other know how to contact them. “They tell their neighbour about this route, and their neighbour tells their friends about it.”
It’s a lot of language, but it’s simple: Facebook has vanished off the internet map. If you ping those IP addresses right now, what happens? Mursch explains, “The packets end up in a black hole.”
Why did the BGP routes vanish in the first place? This is an obvious and unanswered question. It’s not a frequent disease, especially at this size or over such a long period of time. Facebook didn’t say much during the outage, other than that it’s “working to get things back to normal as quickly as possible.” After service was restored late Monday afternoon, the company issued a statement that was still devoid of technical details. “To everyone who was affected by the outages on our platforms today: we’re sorry,” the firm wrote. “We understand that our goods and services keep billions of individuals and companies linked across the world. As we return to the internet, we appreciate your patience.”
The most likely explanation, according to internet infrastructure specialists, is a Facebook misconfiguration. (Facebook verified this in a Monday evening engineering blog post.) “It appears that Facebook has done something to their routers, which connect the Facebook network to the rest of the internet,” says John Graham-Cumming, CTO of internet infrastructure business Cloudflare, who emphasises that he is unaware of the specifics. After all, the internet, he claims, is fundamentally a network of networks, each of which advertises its presence to the others. For the first time in its history, Facebook has halted advertising.
This indicates that Facebook’s external services aren’t the only ones affected. For example, you can’t utilise “Login with Facebook” on third-party websites. Employees are allegedly unable to get much work done today since the company’s internal networks are unable to connect to the outside internet. (Instagram CEO Adam Mosseri even said it felt like a snow day on Twitter.)
This might possibly explain why it’s taking so long to get back on track. A Google Cloud outage in 2019 kept Google engineers from coming online to address the issue, which kept them offline. At the very least, it appears like Facebook is caught in a similar catch-22, unable to connect to the internet in order to address the BGP routing issue that would allow it to connect.
The good news is that once Facebook is able to undo whatever setting caused this, it should be back in business in no time. “Once that’s fixed, the traffic will really start flowing,” Medina adds.
Meanwhile, Facebook’s absence has been felt throughout the internet. People have been attempting to load Facebook, Instagram, and WhatsApp to no effect, while DNS resolvers like Cloudflare—services that translate domain names into IP addresses—have experienced up to double the normal volume of traffic. Although the volume of queries isn’t enough to overload the system, it serves as a reminder of how interconnected, and occasionally vulnerable, the internet is.
“It’s not so much the dramatic storey of the internet collapsing or something like that,” Graham-Cumming explains. “It’s more of an interconnected system that stays up partly due to technical things and partly due to people who keep an eye on it 24 hours a day, seven days a week.”