Understanding Troll Writing as a Linguistic Phenomenon
This work addresses the problem of detecting deceptive online content for social media platforms and researchers, but it is incremental as it applies existing methods to analyze troll writing.
The study tackled the problem of identifying troll writing in tweets by building a neural network that achieved 91% accuracy in classification, and it found that specific linguistic features, such as skewed topic and vocabulary distributions, characterize troll messages due to sociolinguistic factors.
The current study yielded a number of important findings. We managed to build a neural network that achieved an accuracy score of 91 per cent in classifying troll and genuine tweets. By means of regression analysis, we identified a number of features that make a tweet more susceptible to correct labelling and found that they are inherently present in troll tweets as a special type of discourse. We hypothesised that those features are grounded in the sociolinguistic limitations of troll writing, which can be best described as a combination of two factors: speaking with a purpose and trying to mask the purpose of speaking. Next, we contended that the orthogonal nature of these factors must necessarily result in the skewed distribution of many different language parameters of troll messages. Having chosen as an example distribution of the topics and vocabulary associated with those topics, we showed some very pronounced distributional anomalies, thus confirming our prediction.