A Roadmap for Multilingual, Multimodal Domain Independent Deception Detection
This work aims to solve the problem of cross-lingual and multimodal deception detection for researchers and practitioners in security and NLP, but it is incremental as it builds on existing studies and calls for further investigation.
The paper addresses the challenge of detecting deception across multiple languages and modalities, highlighting the lack of research in low-resource languages and the potential for universal linguistic cues. It proposes a roadmap for using multilingual transformer models and labeled data to universally tackle deception detection in computer security and NLP.
Deception, a prevalent aspect of human communication, has undergone a significant transformation in the digital age. With the globalization of online interactions, individuals are communicating in multiple languages and mixing languages on social media, with varied data becoming available in each language and dialect. At the same time, the techniques for detecting deception are similar across the board. Recent studies have shown the possibility of the existence of universal linguistic cues to deception across domains within the English language; however, the existence of such cues in other languages remains unknown. Furthermore, the practical task of deception detection in low-resource languages is not a well-studied problem due to the lack of labeled data. Another dimension of deception is multimodality. For example, a picture with an altered caption in fake news or disinformation may exist. This paper calls for a comprehensive investigation into the complexities of deceptive language across linguistic boundaries and modalities within the realm of computer security and natural language processing and the possibility of using multilingual transformer models and labeled data in various languages to universally address the task of deception detection.