CVCLMay 19, 2023

Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling

arXiv:2305.11719v2228 citationsHas Code
Originality Incremental advance
AI Analysis

This work improves multimodal relation extraction for applications like image-text analysis, though it appears incremental as it builds on existing graph and topic modeling approaches.

The paper tackles multimodal relation extraction by addressing internal-information over-utilization and external-information under-exploitation, proposing a framework that uses cross-modal graphs with feature denoising and multimodal topic modeling, resulting in significant outperformance over the current best model on a benchmark dataset.

Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation. To combat that, we propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting. First, we represent the fine-grained semantic structures of the input image and text with the visual and textual scene graphs, which are further fused into a unified cross-modal graph (CMG). Based on CMG, we perform structure refinement with the guidance of the graph information bottleneck principle, actively denoising the less-informative features. Next, we perform topic modeling over the input image and text, incorporating latent multimodal topic features to enrich the contexts. On the benchmark MRE dataset, our system outperforms the current best model significantly. With further in-depth analyses, we reveal the great potential of our method for the MRE task. Our codes are open at https://github.com/ChocoWu/MRE-ISE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes