Sifting Robotic from Organic Text: A Natural Language Approach for Detecting Automation on Twitter
This addresses the issue of identifying bots for social media platforms and researchers, but it is incremental as it builds on existing detection methods by focusing on text alone.
The paper tackled the problem of detecting automated accounts (bots) on Twitter by developing a classification scheme that uses only natural language text from organic users, achieving a flexible method applicable to any textual data beyond Twitter.
Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. Due to the increasing popularity of Twitter, its perceived potential for exerting social influence has led to the rise of a diverse community of automatons, commonly referred to as bots. These inorganic and semi-organic Twitter entities can range from the benevolent (e.g., weather-update bots, help-wanted-alert bots) to the malevolent (e.g., spamming messages, advertisements, or radical opinions). Existing detection algorithms typically leverage meta-data (time between tweets, number of followers, etc.) to identify robotic accounts. Here, we present a powerful classification scheme that exclusively uses the natural language text from organic users to provide a criterion for identifying accounts posting automated messages. Since the classifier operates on text alone, it is flexible and may be applied to any textual data beyond the Twitter-sphere.