LGCVMar 19, 2024

TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer

arXiv:2403.12481v212 citationsFusion
AI Analysis

This addresses fake news detection, an important societal problem, but appears incremental as it builds on existing BLIP and transformer methods.

The paper tackles fake news detection by proposing TT-BLIP, an end-to-end model that integrates text, image, and multimodal features using BLIP and a Tri-Transformer, and it outperforms state-of-the-art models on Weibo and Gossipcop datasets.

Detecting fake news has received a lot of attention. Many previous methods concatenate independently encoded unimodal data, ignoring the benefits of integrated multimodal information. Also, the absence of specialized feature extraction for text and images further limits these methods. This paper introduces an end-to-end model called TT-BLIP that applies the bootstrapping language-image pretraining for unified vision-language understanding and generation (BLIP) for three types of information: BERT and BLIPTxt for text, ResNet and BLIPImg for images, and bidirectional BLIP encoders for multimodal information. The Multimodal Tri-Transformer fuses tri-modal features using three types of multi-head attention mechanisms, ensuring integrated modalities for enhanced representations and improved multimodal data analysis. The experiments are performed using two fake news datasets, Weibo and Gossipcop. The results indicate TT-BLIP outperforms the state-of-the-art models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes