CLLGFeb 5, 2021

RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER

arXiv:2102.02967v1189 citations
Originality Incremental advance
AI Analysis

This work provides an incremental improvement for researchers and practitioners working on multimodal named entity recognition in social media by filtering irrelevant visual information.

This paper addresses the issue of irrelevant visual clues in multimodal named entity recognition (MNER) for tweets by introducing text-image relation propagation into a BERT model. The proposed method integrates soft or hard gates to select relevant visual clues and uses a multitask algorithm, achieving state-of-the-art performance on MNER datasets.

Recently multimodal named entity recognition (MNER) has utilized images to improve the accuracy of NER in tweets. However, most of the multimodal methods use attention mechanisms to extract visual clues regardless of whether the text and image are relevant. Practically, the irrelevant text-image pairs account for a large proportion in tweets. The visual clues that are unrelated to the texts will exert uncertain or even negative effects on multimodal model learning. In this paper, we introduce a method of text-image relation propagation into the multimodal BERT model. We integrate soft or hard gates to select visual clues and propose a multitask algorithm to train on the MNER datasets. In the experiments, we deeply analyze the changes in visual attention before and after the use of text-image relation propagation. Our model achieves state-of-the-art performance on the MNER datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes