Zero-shot Cross-lingual NER via Mitigating Language Difference: An Entity-aligned Translation Perspective
This addresses the challenge of transferring NER knowledge to low-resource non-Latin script languages, which is an incremental improvement over existing methods focused on Latin script languages.
The paper tackles the problem of zero-shot cross-lingual named entity recognition (CL-NER) for non-Latin script languages like Chinese and Japanese, where performance degrades due to structural differences, by proposing an entity-aligned translation (EAT) approach using large language models and fine-tuning with multilingual Wikipedia data to align entities between languages.
Cross-lingual Named Entity Recognition (CL-NER) aims to transfer knowledge from high-resource languages to low-resource languages. However, existing zero-shot CL-NER (ZCL-NER) approaches primarily focus on Latin script language (LSL), where shared linguistic features facilitate effective knowledge transfer. In contrast, for non-Latin script language (NSL), such as Chinese and Japanese, performance often degrades due to deep structural differences. To address these challenges, we propose an entity-aligned translation (EAT) approach. Leveraging large language models (LLMs), EAT employs a dual-translation strategy to align entities between NSL and English. In addition, we fine-tune LLMs using multilingual Wikipedia data to enhance the entity alignment from source to target languages.