CL AIMay 25, 2025

RetrieveAll: A Multilingual Named Entity Recognition Framework with Large Language Models

Jin Zhang, Fan Gao, Linyu Li, Yongbin Yu, Xiangxiang Wang, Nyima Tashi, Gadeng Luosang

arXiv:2505.19128v12.7h-index: 17

Originality Incremental advance

AI Analysis

This addresses multilingual NER scalability for low-resource languages, though it appears incremental as it builds on existing LoRA and prompting techniques.

The paper tackles the problem of language interference in multilingual named entity recognition, where high-resource languages suppress low-resource ones, and proposes RetrieveAll, a framework using dynamic LoRA and cross-granularity knowledge augmentation, achieving an average F1 improvement of 12.1% on the PAN-X dataset.

The rise of large language models has led to significant performance breakthroughs in named entity recognition (NER) for high-resource languages, yet there remains substantial room for improvement in low- and medium-resource languages. Existing multilingual NER methods face severe language interference during the multi-language adaptation process, manifested in feature conflicts between different languages and the competitive suppression of low-resource language features by high-resource languages. Although training a dedicated model for each language can mitigate such interference, it lacks scalability and incurs excessive computational costs in real-world applications. To address this issue, we propose RetrieveAll, a universal multilingual NER framework based on dynamic LoRA. The framework decouples task-specific features across languages and demonstrates efficient dynamic adaptability. Furthermore, we introduce a cross-granularity knowledge augmented method that fully exploits the intrinsic potential of the data without relying on external resources. By leveraging a hierarchical prompting mechanism to guide knowledge injection, this approach advances the paradigm from "prompt-guided inference" to "prompt-driven learning." Experimental results show that RetrieveAll outperforms existing baselines; on the PAN-X dataset, it achieves an average F1 improvement of 12.1 percent.

View on arXiv PDF

Similar