CLNov 17, 2022

ConNER: Consistency Training for Cross-lingual Named Entity Recognition

arXiv:2211.09394v1302 citationsh-index: 113
Originality Incremental advance
AI Analysis

This addresses the challenge of cross-lingual NER for languages with limited labeled data, though it appears incremental as it builds on existing consistency training approaches.

The paper tackles the problem of data scarcity and noise in cross-lingual named entity recognition (NER) by proposing ConNER, a consistency training framework that uses translation-based and dropout-based methods to leverage unlabeled target-language data and reduce overfitting on the source language, achieving consistent improvements over baseline methods.

Cross-lingual named entity recognition (NER) suffers from data scarcity in the target languages, especially under zero-shot settings. Existing translate-train or knowledge distillation methods attempt to bridge the language gap, but often introduce a high level of noise. To solve this problem, consistency training methods regularize the model to be robust towards perturbations on data or hidden states. However, such methods are likely to violate the consistency hypothesis, or mainly focus on coarse-grain consistency. We propose ConNER as a novel consistency training framework for cross-lingual NER, which comprises of: (1) translation-based consistency training on unlabeled target-language data, and (2) dropoutbased consistency training on labeled source-language data. ConNER effectively leverages unlabeled target-language data and alleviates overfitting on the source language to enhance the cross-lingual adaptability. Experimental results show our ConNER achieves consistent improvement over various baseline methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes