LGAICVJan 21, 2025

CroMe: Multimodal Fake News Detection using Cross-Modal Tri-Transformer and Metric Learning

arXiv:2501.12422v19 citationsh-index: 3IEEE Access
Originality Incremental advance
AI Analysis

This addresses the problem of detecting fake news in multimodal content for social media platforms and fact-checkers, representing an incremental improvement over existing methods.

The paper tackles multimodal fake news detection by proposing CroMe, which uses cross-modal transformers and metric learning to capture intra-modality relationships and integrate inter-modal similarities, achieving state-of-the-art performance on benchmark datasets.

Multimodal Fake News Detection has received increasing attention recently. Existing methods rely on independently encoded unimodal data and overlook the advantages of capturing intra-modality relationships and integrating inter-modal similarities using advanced techniques. To address these issues, Cross-Modal Tri-Transformer and Metric Learning for Multimodal Fake News Detection (CroMe) is proposed. CroMe utilizes Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models (BLIP2) as encoders to capture detailed text, image and combined image-text representations. The metric learning module employs a proxy anchor method to capture intra-modality relationships while the feature fusion module uses a Cross-Modal and Tri-Transformer for effective integration. The final fake news detector processes the fused features through a classifier to predict the authenticity of the content. Experiments on datasets show that CroMe excels in multimodal fake news detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes