CLMay 4, 2023

USTC-NELSLIP at SemEval-2023 Task 2: Statistical Construction and Dual Adaptation of Gazetteer for Multilingual Complex NER

Jun-Yu Ma, Jia-Chen Gu, Jiajun Qi, Zhen-Hua Ling, Quan Liu, Xiaoyi Zhao

arXiv:2305.02517v126.1222 citations

Originality Incremental advance

AI Analysis

This work addresses multilingual NER for NLP applications, but it is incremental as it builds on existing methods like XLM-R and gazetteer integration.

The paper tackled multilingual complex named entity recognition by proposing a method that constructs a gazetteer statistically and adapts it with language models, achieving first place on the Hindi track in SemEval-2023 Task 2.

This paper describes the system developed by the USTC-NELSLIP team for SemEval-2023 Task 2 Multilingual Complex Named Entity Recognition (MultiCoNER II). A method named Statistical Construction and Dual Adaptation of Gazetteer (SCDAG) is proposed for Multilingual Complex NER. The method first utilizes a statistics-based approach to construct a gazetteer. Secondly, the representations of gazetteer networks and language models are adapted by minimizing the KL divergence between them at both the sentence-level and entity-level. Finally, these two networks are then integrated for supervised named entity recognition (NER) training. The proposed method is applied to XLM-R with a gazetteer built from Wikidata, and shows great generalization ability across different tracks. Experimental results and detailed analysis verify the effectiveness of the proposed method. The official results show that our system ranked 1st on one track (Hindi) in this task.

View on arXiv PDF

Similar