Quality Estimation with $k$-nearest Neighbors and Automatic Evaluation for Model-specific Quality Estimation
This work addresses the challenge of assessing quality estimation in machine translation for users needing reliability scores, but it is incremental as it builds on existing QE and evaluation methods.
The paper tackles the problem of evaluating model-specific quality estimation for machine translation by proposing an unsupervised approach called kNN-QE that uses k-nearest neighbors from training data, and introduces an automatic evaluation method using reference-based metrics like MetricX-23 as a gold standard, concluding it is sufficient for the task.
Providing quality scores along with Machine Translation (MT) output, so-called reference-free Quality Estimation (QE), is crucial to inform users about the reliability of the translation. We propose a model-specific, unsupervised QE approach, termed $k$NN-QE, that extracts information from the MT model's training data using $k$-nearest neighbors. Measuring the performance of model-specific QE is not straightforward, since they provide quality scores on their own MT output, thus cannot be evaluated using benchmark QE test sets containing human quality scores on premade MT output. Therefore, we propose an automatic evaluation method that uses quality scores from reference-based metrics as gold standard instead of human-generated ones. We are the first to conduct detailed analyses and conclude that this automatic method is sufficient, and the reference-based MetricX-23 is best for the task.