CLApr 18, 2024

Variational Multi-Modal Hypergraph Attention Network for Multi-Modal Relation Extraction

arXiv:2404.12006v16 citationsh-index: 18IJCAI
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in multi-modal relation extraction for natural language processing and computer vision applications, offering an incremental improvement over existing methods.

The paper tackles the challenge of multi-modal relation extraction by addressing the neglect of multiple entity pairs sharing similar contextual information, proposing the Variational Multi-Modal Hypergraph Attention Network (VM-HAN) which achieves state-of-the-art performance with improved accuracy and efficiency.

Multi-modal relation extraction (MMRE) is a challenging task that aims to identify relations between entities in text leveraging image information. Existing methods are limited by their neglect of the multiple entity pairs in one sentence sharing very similar contextual information (ie, the same text and image), resulting in increased difficulty in the MMRE task. To address this limitation, we propose the Variational Multi-Modal Hypergraph Attention Network (VM-HAN) for multi-modal relation extraction. Specifically, we first construct a multi-modal hypergraph for each sentence with the corresponding image, to establish different high-order intra-/inter-modal correlations for different entity pairs in each sentence. We further design the Variational Hypergraph Attention Networks (V-HAN) to obtain representational diversity among different entity pairs using Gaussian distribution and learn a better hypergraph structure via variational attention. VM-HAN achieves state-of-the-art performance on the multi-modal relation extraction task, outperforming existing methods in terms of accuracy and efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes