CLOct 12, 2022

SilverAlign: MT-Based Silver Data Algorithm For Evaluating Word Alignment

arXiv:2210.06207v281 citationsh-index: 70
AI Analysis

This addresses the important scenario of missing gold data alignments for low-resource languages, providing a resource for evaluation across domains and languages.

The paper tackles the problem of scarce gold evaluation data for word aligners by proposing SilverAlign, a method to automatically create silver data using machine translation and minimal pairs, and shows that performance on this silver data correlates well with gold benchmarks for 9 language pairs.

Word alignments are essential for a variety of NLP tasks. Therefore, choosing the best approaches for their creation is crucial. However, the scarce availability of gold evaluation data makes the choice difficult. We propose SilverAlign, a new method to automatically create silver data for the evaluation of word aligners by exploiting machine translation and minimal pairs. We show that performance on our silver data correlates well with gold benchmarks for 9 language pairs, making our approach a valid resource for evaluation of different domains and languages when gold data are not available. This addresses the important scenario of missing gold data alignments for low-resource languages.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes