IYKYK: Using language models to decode extremist cryptolects
This addresses the challenge of automated moderation for extremist content online, though it is incremental as it builds on existing methods with domain-specific adaptations.
The study tackled the problem of detecting and interpreting extremist cryptolects using language models, finding that general-purpose LLMs performed poorly but domain adaptation and specialized prompting significantly improved results, with novel datasets of 19.4M posts released.
Extremist groups develop complex in-group language, also referred to as cryptolects, to exclude or mislead outsiders. We investigate the ability of current language technologies to detect and interpret the cryptolects of two online extremist platforms. Evaluating eight models across six tasks, our results indicate that general purpose LLMs cannot consistently detect or decode extremist language. However, performance can be significantly improved by domain adaptation and specialised prompting techniques. These results provide important insights to inform the development and deployment of automated moderation technologies. We further develop and release novel labelled and unlabelled datasets, including 19.4M posts from extremist platforms and lexicons validated by human experts.