CLFeb 8, 2021

Quality Estimation without Human-labeled Data

arXiv:2102.04020v1806 citations
Originality Incremental advance
AI Analysis

This work is significant for machine translation developers and users who need to assess translation quality in real-world scenarios without the high cost of human-labeled data.

This paper addresses the problem of quality estimation for machine translation without human-labeled data. The authors propose a technique that uses synthetic training data to train off-the-shelf architectures, achieving comparable performance to models trained on human-annotated data for both sentence and word-level prediction.

Quality estimation aims to measure the quality of translated content without access to a reference translation. This is crucial for machine translation systems in real-world scenarios where high-quality translation is needed. While many approaches exist for quality estimation, they are based on supervised machine learning requiring costly human labelled data. As an alternative, we propose a technique that does not rely on examples from human-annotators and instead uses synthetic training data. We train off-the-shelf architectures for supervised quality estimation on our synthetic data and show that the resulting models achieve comparable performance to models trained on human-annotated data, both for sentence and word-level prediction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes