After reports at the end of 2022 that hackers were selling data stolen from 400 million Twitter users, researchers now say that a widely circulated trove of email addresses linked to about 200 million users is likely a refined version of the larger trove with duplicate entries removed. The social network has not yet commented on the massive exposure, but the cache of data clarifies the severity of the leak and who may be most at risk as a result of it.
From June 2021 until January 2022, there was a bug in a Twitter application programming interface, or API, that allowed attackers to submit contact information like email addresses and receive the associated Twitter account, if any, in return. Before it was patched, attackers exploited the flaw to “scrape” data from the social network. And while the bug didn’t allow hackers to access passwords or other sensitive information like DMs, it did expose the connection between Twitter accounts, which are often pseudonymous, and the email addresses and phone numbers linked to them, potentially identifying users.
While it was live, the vulnerability was seemingly exploited by multiple actors to build different collections of data. One that has been circulating in criminal forums since the summer included the email addresses and phone numbers of about 5.4 million Twitter users. The massive, newly surfaced trove seems to only contain email addresses. However, widespread circulation of the data creates the risk that it will fuel phishing attacks, identity theft attempts, and other individual targeting.
Twitter did not respond to WIRED’s requests for comment. The company wrote about the API vulnerability in an August disclosure: “When we learned about this, we immediately investigated and fixed it. At that time, we had no evidence to suggest someone had taken advantage of the vulnerability.” Seemingly, Twitter’s telemetry was insufficient to detect the malicious scraping.
Twitter is far from the first platform to expose data to mass scraping through an API flaw, and it is common in such scenarios for there to be confusion about how many distinct troves of data actually exist as a result of malicious exploitation. These incidents are still significant, though, because they add more connections and validation to the massive body of stolen data that already exists in the criminal ecosystem about users.
“Obviously, there are many people who were aware of this API vulnerability and many people who scraped it. Did different people scrape different things? How many troves are there? It kind of doesn’t matter,” says Troy Hunt, founder of the breach-tracking site HaveIBeenPwned. Hunt ingested the Twitter data set into HaveIBeenPwned and says that it represented information about more than 200 million accounts. Ninety-eight percent of the email addresses had already been exposed in past breaches recorded by HaveIBeenPwned. And Hunt says he sent notification emails to nearly 1,064,000 of his service’s 4,400,000 million email subscribers.
“It’s the first time I’ve sent a seven-figure email,” he says. “Almost a quarter of my entire corpus of subscribers is really significant. But because so much of this was already out there, I don’t think this is going to be an incident that has a long tail in terms of impact. But it may de-anonymize people. The thing I’m more worried about is those individuals who wanted to maintain their privacy.”