CLAILGJun 3

Multilingual Coreference Resolution via Cycle-Consistent Machine Translation

arXiv:2606.0544419.8
Predicted impact top 55% in CL · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the lack of coreference resolution resources for low-resource languages, offering a practical solution for expanding NLP capabilities to under-served languages.

The paper proposes a pipeline that uses cycle-consistent machine translation to generate training data for coreference resolution in low-resource languages, achieving significant performance gains across four languages and enabling coreference resolution where no prior corpora existed.

Coreference resolution is a core NLP task, having a broad range of downstream applications, e.g.~machine translation, question answering, document summarization, etc. While the task is well-studied in English, comparatively less attention is dedicated to coreference resolution in other languages, especially low-resource ones. To mitigate this gap, we propose a novel coreference resolution pipeline that harnesses machine translation (MT) from English to a target low-resource language, to generate or expand training data. To automatically validate the quality of the translated samples, we back-translate the samples and assess the similarity with the original English samples via cosine similarity in the latent space of a BERT model. The resulting similarity scores are integrated into the loss function to weight training samples according to their MT cycle consistency. Extensive experiments on four low-resource languages show that our pipeline brings significant performance gains in coreference resolution. Moreover, our pipeline enables accurate coreference resolution in languages where no previous corpora were available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes