CLAIJun 6, 2021

The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

arXiv:2106.03193v1804 citations
Originality Synthesis-oriented
AI Analysis

This provides a high-quality benchmark for the machine translation community, addressing a critical bottleneck in evaluating low-resource and multilingual models, though it is incremental as it builds on existing evaluation needs.

The authors tackled the lack of good evaluation benchmarks for low-resource and multilingual machine translation by introducing FLORES-101, a dataset of 3001 sentences translated into 101 languages by professional translators, enabling better assessment of model quality on low-resource languages and many-to-many multilingual systems.

One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either lack good coverage of low-resource languages, consider only restricted domains, or are low quality because they are constructed using semi-automatic procedures. In this work, we introduce the FLORES-101 evaluation benchmark, consisting of 3001 sentences extracted from English Wikipedia and covering a variety of different topics and domains. These sentences have been translated in 101 languages by professional translators through a carefully controlled process. The resulting dataset enables better assessment of model quality on the long tail of low-resource languages, including the evaluation of many-to-many multilingual translation systems, as all translations are multilingually aligned. By publicly releasing such a high-quality and high-coverage dataset, we hope to foster progress in the machine translation community and beyond.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes