Bot-Match: Social Bot Detection with Recursive Nearest Neighbors Search
This addresses the gap for researchers, journalists, and analysts in social cybersecurity by providing a complementary tool to detect emerging bot threats, though it is incremental as it builds on existing similarity-based approaches.
The paper tackles the problem of detecting evolving social bots that evade current supervised methods by proposing Bot-Match, a semi-supervised recursive nearest neighbors search using social media embeddings, which enables finding similar malicious accounts without retraining models.
Social bots have emerged over the last decade, initially creating a nuisance while more recently used to intimidate journalists, sway electoral events, and aggravate existing social fissures. This social threat has spawned a bot detection algorithms race in which detection algorithms evolve in an attempt to keep up with increasingly sophisticated bot accounts. This cat and mouse cycle has illuminated the limitations of supervised machine learning algorithms, where researchers attempt to use yesterday's data to predict tomorrow's bots. This gap means that researchers, journalists, and analysts daily identify malicious bot accounts that are undetected by state of the art supervised bot detection algorithms. These analysts often desire to find similar bot accounts without labeling/training a new model, where similarity can be defined by content, network position, or both. A similarity based algorithm could complement existing supervised and unsupervised methods and fill this gap. To this end, we present the Bot-Match methodology in which we evaluate social media embeddings that enable a semi-supervised recursive nearest neighbors search to map an emerging social cybersecurity threat given one or more seed accounts.