CLJun 4

Automatic Labelling of Speech Translation Errors

arXiv:2606.0604788.4
Predicted impact top 44% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

This work addresses the lack of established methodology for evaluating speech translation errors, which is important for trustworthiness in ST systems, but the results are preliminary and incremental.

The paper introduces Speech Translation Error Labelling (STEL), a methodology for evaluating confidence and quality estimation in speech translation. Results show that text-only XCOMET and multimodal LLM Qwen2.5-Omni achieve roughly half the precision of humans, with direct speech processing being necessary and complementary to text-only systems.

Errors in speech translations reduce trustworthiness of Speech Translation (ST) systems and can have serious consequences. Yet currently there is no established methodology for evaluating confidence and quality estimation of speech translations. To initiate progress in this direction, we propose Speech Translation Error Labelling (STEL). We create an annotation protocol, a small authentic end-to-end evaluation dataset, and we analyse how existing text-only and speech-processing systems perform the STEL task. Our results show that text-only XCOMET and multimodal LLM Qwen2.5-Omni are able to perform the STEL task in roughly half the precision of humans. We also find that direct speech processing is necessary for the STEL task, and that the current text-only and speech-processing systems are complementary in labelling translation-only vs. speech-processing errors in ST.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes