LG SIJan 17, 2023

Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection

Chris Hays, Zachary Schutzman, Manish Raghavan, Erin Walk, Philipp Zimmer

MIT

arXiv:2301.07015v213.741 citationsh-index: 20Has Code

Originality Synthesis-oriented

AI Analysis

This reveals critical flaws in bot detection for online safety and research, such as in elections and misinformation, but is incremental as it highlights existing dataset issues rather than proposing a new solution.

The study tackled the problem of overestimated accuracy in Twitter bot detection tools by showing that high performance on benchmark datasets is due to dataset limitations, not tool sophistication, with simple decision rules achieving near-state-of-the-art results and poor generalization across datasets.

Accurate bot detection is necessary for the safety and integrity of online platforms. It is also crucial for research on the influence of bots in elections, the spread of misinformation, and financial market manipulation. Platforms deploy infrastructure to flag or remove automated accounts, but their tools and data are not publicly available. Thus, the public must rely on third-party bot detection. These tools employ machine learning and often achieve near perfect performance for classification on existing datasets, suggesting bot detection is accurate, reliable and fit for use in downstream applications. We provide evidence that this is not the case and show that high performance is attributable to limitations in dataset collection and labeling rather than sophistication of the tools. Specifically, we show that simple decision rules -- shallow decision trees trained on a small number of features -- achieve near-state-of-the-art performance on most available datasets and that bot detection datasets, even when combined together, do not generalize well to out-of-sample datasets. Our findings reveal that predictions are highly dependent on each dataset's collection and labeling procedures rather than fundamental differences between bots and humans. These results have important implications for both transparency in sampling and labeling procedures and potential biases in research using existing bot detection tools for pre-processing.

View on arXiv PDF Code

Similar