CLOct 28, 2024

SpeechQE: Estimating the Quality of Direct Speech Translation

arXiv:2410.21485v125 citationsh-index: 36EMNLP
Originality Incremental advance
AI Analysis

This addresses the need for reliable quality estimation in speech translation systems, which is incremental as it adapts existing methods to a new modality.

The paper tackles the problem of quality estimation for speech translation, which has been underexplored compared to written language, by formulating the SpeechQE task, constructing a benchmark, and evaluating systems, finding that end-to-end approaches outperform cascaded ones.

Recent advances in automatic quality estimation for machine translation have exclusively focused on written language, leaving the speech modality underexplored. In this work, we formulate the task of quality estimation for speech translation (SpeechQE), construct a benchmark, and evaluate a family of systems based on cascaded and end-to-end architectures. In this process, we introduce a novel end-to-end system leveraging pre-trained text LLM. Results suggest that end-to-end approaches are better suited to estimating the quality of direct speech translation than using quality estimation systems designed for text in cascaded systems. More broadly, we argue that quality estimation of speech translation needs to be studied as a separate problem from that of text, and release our data and models to guide further research in this space.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes