CLAICVLGMMMay 7, 2022

Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction

arXiv:2205.03521v151 citationsh-index: 41Has Code
Originality Incremental advance
AI Analysis

This addresses a fundamental problem in information extraction for applications like knowledge graph construction, though it appears incremental as it builds on existing multimodal extraction methods.

The paper tackles error sensitivity in multimodal named entity recognition and relation extraction (MNER and MRE) caused by irrelevant object images, proposing a Hierarchical Visual Prefix fusion NeTwork (HVPNeT) that achieves state-of-the-art performance on three benchmark datasets.

Multimodal named entity recognition and relation extraction (MNER and MRE) is a fundamental and crucial branch in information extraction. However, existing approaches for MNER and MRE usually suffer from error sensitivity when irrelevant object images incorporated in texts. To deal with these issues, we propose a novel Hierarchical Visual Prefix fusion NeTwork (HVPNeT) for visual-enhanced entity and relation extraction, aiming to achieve more effective and robust performance. Specifically, we regard visual representation as pluggable visual prefix to guide the textual representation for error insensitive forecasting decision. We further propose a dynamic gated aggregation strategy to achieve hierarchical multi-scaled visual features as visual prefix for fusion. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our method, and achieve state-of-the-art performance. Code is available in https://github.com/zjunlp/HVPNeT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes