Is Human Annotation Necessary? Iterative MBR Distillation for Error Span Detection in Machine Translation
This addresses the cost and inconsistency issues in annotation for machine translation evaluation, offering a method to reduce reliance on human input, though it is incremental as it builds on existing MBR and distillation techniques.
The paper tackles the problem of expensive and inconsistent human annotations for Error Span Detection in Machine Translation by proposing a self-evolution framework using Minimum Bayes Risk decoding and an off-the-shelf LLM to generate pseudo-labels, resulting in models that outperform supervised baselines on system and span levels while maintaining competitive sentence-level performance.
Error Span Detection (ESD) is a crucial subtask in Machine Translation (MT) evaluation, aiming to identify the location and severity of translation errors. While fine-tuning models on human-annotated data improves ESD performance, acquiring such data is expensive and prone to inconsistencies among annotators. To address this, we propose a novel self-evolution framework based on Minimum Bayes Risk (MBR) decoding, named Iterative MBR Distillation for ESD, which eliminates the reliance on human annotations by leveraging an off-the-shelf LLM to generate pseudo-labels. Extensive experiments on the WMT Metrics Shared Task datasets demonstrate that models trained solely on these self-generated pseudo-labels outperform both unadapted base model and supervised baselines trained on human annotations at the system and span levels, while maintaining competitive sentence-level performance.