Towards Ethical Content-Based Detection of Online Influence Campaigns
This work addresses the challenge of ethical content-based detection for online communities, but is incremental as it builds on existing NLI methods to mitigate a specific shortcoming.
The paper tackled the problem of detecting online influence campaigns using text features, but found that keyword-based features increased false positives. They introduced named entity masking to reduce false positives while maintaining accuracy, but noted ethical concerns when models performed poorly on English sentences by Russian speakers.
The detection of clandestine efforts to influence users in online communities is a challenging problem with significant active development. We demonstrate that features derived from the text of user comments are useful for identifying suspect activity, but lead to increased erroneous identifications when keywords over-represented in past influence campaigns are present. Drawing on research in native language identification (NLI), we use "named entity masking" (NEM) to create sentence features robust to this shortcoming, while maintaining comparable classification accuracy. We demonstrate that while NEM consistently reduces false positives when key named entities are mentioned, both masked and unmasked models exhibit increased false positive rates on English sentences by Russian native speakers, raising ethical considerations that should be addressed in future research.