Multi-modal Fake News Detection on Social Media via Multi-grained Information Fusion
This work addresses the problem of fake news dissemination for social media users and platforms, with an incremental improvement over existing multi-modal methods.
The paper tackles fake news detection on social media by proposing a Multi-grained Multi-modal Fusion Network (MMFN) that fuses fine-grained and coarse-grained information from text and images, addressing ambiguity issues, and it outperforms state-of-the-art methods on three datasets.
The easy sharing of multimedia content on social media has caused a rapid dissemination of fake news, which threatens society's stability and security. Therefore, fake news detection has garnered extensive research interest in the field of social forensics. Current methods primarily concentrate on the integration of textual and visual features but fail to effectively exploit multi-modal information at both fine-grained and coarse-grained levels. Furthermore, they suffer from an ambiguity problem due to a lack of correlation between modalities or a contradiction between the decisions made by each modality. To overcome these challenges, we present a Multi-grained Multi-modal Fusion Network (MMFN) for fake news detection. Inspired by the multi-grained process of human assessment of news authenticity, we respectively employ two Transformer-based pre-trained models to encode token-level features from text and images. The multi-modal module fuses fine-grained features, taking into account coarse-grained features encoded by the CLIP encoder. To address the ambiguity problem, we design uni-modal branches with similarity-based weighting to adaptively adjust the use of multi-modal features. Experimental results demonstrate that the proposed framework outperforms state-of-the-art methods on three prevalent datasets.