CL LGApr 13, 2021

Understanding Transformers for Bot Detection in Twitter

Andres Garcia-Silva, Cristian Berrio, Jose Manuel Gomez-Perez

arXiv:2104.06182v10.56 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses bot detection to combat disinformation on social media, but it is incremental as it compares existing transformer methods on a specific dataset.

The paper investigates fine-tuning pre-trained language models for bot detection on Twitter, finding that generative transformers like GPT and GPT-2 outperform BERT in accuracy after fine-tuning, with analysis showing BERT loses some syntactical and distributional properties during fine-tuning.

In this paper we shed light on the impact of fine-tuning over social media data in the internal representations of neural language models. We focus on bot detection in Twitter, a key task to mitigate and counteract the automatic spreading of disinformation and bias in social media. We investigate the use of pre-trained language models to tackle the detection of tweets generated by a bot or a human account based exclusively on its content. Unlike the general trend in benchmarks like GLUE, where BERT generally outperforms generative transformers like GPT and GPT-2 for most classification tasks on regular text, we observe that fine-tuning generative transformers on a bot detection task produces higher accuracies. We analyze the architectural components of each transformer and study the effect of fine-tuning on their hidden states and output representations. Among our findings, we show that part of the syntactical information and distributional properties captured by BERT during pre-training is lost upon fine-tuning while the generative pre-training approach manage to preserve these properties.

View on arXiv PDF Code

Similar