Posted: 3 Min ReadResearch Group

Introducing BotSight: A New Tool to Detect Bots on Twitter in Real-Time

Quantifying Disinformation on Twitter, one Tweet at a Time

There is more awareness around disinformation than ever before, yet there is still little understanding of just how much disinformation there truly is.

Major social media platforms are starting to clamp down on not only removing content but also on accounts whose sole purpose is to spread disinformation at scale. In 2019 – the first year for which we have complete data – Twitter removed just over 26,600 accounts. But when you consider the fact that Twitter had roughly 330 million active users in 2019, 26,000 sounds like such an inconsequential number – in fact, it seems you’re very unlikely to ever come into contact with a bot.

So, which is it? Are bots the scourge of social media, or an entirely overblown problem?

With these questions in mind, we trained a state-of-the-art machine learning model that can detect Twitter bots with a high degree of accuracy, achieving an Area Under Curve – a common indicator of model quality – of 0.967 on popular research datasets, which matches or exceeds the best current academic results. But we didn’t stop there. We created a tool – called BotSight – which takes the results of our model and injects them directly into the Twitter feed. Now, we are releasing a beta version of BotSight (on popular browsers and iOS) to give people a better understanding of how bots operate on Twitter. You can download it for free here.

BotSight annotates tweets inline and in real time. In this example it annotates the Tweeting account as well as the mentioned accounts.
BotSight annotates tweets inline and in real time. In this example it annotates the Tweeting account as well as the mentioned accounts.
A likely bot replying to a popular TV personality. Bots are often found in replies to popular Twitter accounts.
A likely bot replying to a popular TV personality. Bots are often found in replies to popular Twitter accounts.

BotSight works across the majority of Twitter including search, trending topics, and your home timeline. For the past six months, our team has been diligently scrolling through Twitter with BotSight enabled in order to continuously test and improve both our model and our design. It has also enabled us to better understand bots, contextualizing where they are likely to appear and how they act.

To determine whether an account is a bot, we look at over 20 different distinguishing features per case, including the amount of randomness in the Twitter handle, whether the account is verified, the rate at which it is acquiring followers, and the account’s description. We verified our approach by observing BotSight in action. So far, BotSight’s beta users have successfully analyzed over 100,000 Twitter accounts.

Using BotSight’s classifier on what we believe is the largest archive of Twitter’s historical data ever collected outside Twitter (over 4TB), we found many interesting and surprising things. One is that the problem of disinformation is not as small as Twitter’s numbers suggested on first blush, but also nowhere near the more sensational headlines we’ve seen. We’ve found that about 5% of tweets belong to bots overall, and this percentage has gone down over time, which is a testament to the hard work of Twitter’s Site Integrity team.

There has been a noticeable decrease in the ratio of bots to human accounts on Twitter over the past 5 years.
There has been a noticeable decrease in the ratio of bots to human accounts on Twitter over the past 5 years.

However, this percentage can go up as high as 20% when viewing trending topics, such as #COVID19 or other trending hashtags. In our analysis of recent coronavirus-related tweets, we found that between 6-18% of users tweeting on this subject were bots, depending on which time period we sampled, while a random sample of the Twitter stream indicates 4-8% bot activity by volume over the same time period. This contrast shows that bots are strategic about their behaviour: favoring current events to maximize their impact.

All these numbers differ depending on language, topic, and time of day. That’s precisely why seeing it right in your Twitter feed itself is so helpful.

While we made every effort to make sure BotSight works well, it is still a research prototype. We invite you to use BotSight and share your feedback with us at DL-botsight@nortonlifelock.com.

Copyright © 2020 NortonLifeLock Inc. All rights reserved. NortonLifeLock, the NortonLifeLock Logo, the Checkmark Logo, Norton, LifeLock, and the LockMan Logo are trademarks or registered trademarks of NortonLifeLock Inc. or its affiliates in the United States and other countries. Other names may be trademarks of their respective owners.

About the Author

Daniel Kats

Principal Researcher NortonLifeLock Research Group

Daniel earned his Masters at the University of Toronto Systems & Networking Group. His research involves building machine learning systems for security, and the subtle impact of those systems on the people who use them.

Want to comment on this post?

We encourage you to share your thoughts on your favorite social platform.