CL AI CV LG MMMay 7, 2022

Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction

Xiang Chen, Ningyu Zhang, Lei Li, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, Huajun Chen

arXiv:2205.03521v12.351 citationsh-index: 41Has Code

Originality Incremental advance

AI Analysis

This addresses a fundamental problem in information extraction for applications like knowledge graph construction, though it appears incremental as it builds on existing multimodal extraction methods.

The paper tackles error sensitivity in multimodal named entity recognition and relation extraction (MNER and MRE) caused by irrelevant object images, proposing a Hierarchical Visual Prefix fusion NeTwork (HVPNeT) that achieves state-of-the-art performance on three benchmark datasets.

Multimodal named entity recognition and relation extraction (MNER and MRE) is a fundamental and crucial branch in information extraction. However, existing approaches for MNER and MRE usually suffer from error sensitivity when irrelevant object images incorporated in texts. To deal with these issues, we propose a novel Hierarchical Visual Prefix fusion NeTwork (HVPNeT) for visual-enhanced entity and relation extraction, aiming to achieve more effective and robust performance. Specifically, we regard visual representation as pluggable visual prefix to guide the textual representation for error insensitive forecasting decision. We further propose a dynamic gated aggregation strategy to achieve hierarchical multi-scaled visual features as visual prefix for fusion. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our method, and achieve state-of-the-art performance. Code is available in https://github.com/zjunlp/HVPNeT.

View on arXiv PDF Code

Similar