CLFeb 20, 2025

Early-Exit and Instant Confidence Translation Quality Estimation

Vilém Zouhar, Maike Züfle, Beni Egressy, Julius Cheng, Mrinmaya Sachan, Jan Niehues

ETH Zurich

arXiv:2502.14429v28.33 citationsh-index: 15Has Code

Originality Incremental advance

AI Analysis

This work addresses efficiency and uncertainty estimation challenges for machine translation pipelines, offering incremental improvements to existing methods.

The paper tackles the high computational cost and opacity of quality estimation models in machine translation by introducing Early-Exit COMET and Instant Confidence COMET, which reduce required compute by 50% with minimal performance degradation in evaluation and reranking tasks.

Quality estimation is omnipresent in machine translation, for both evaluation and generation. Unfortunately, quality estimation models are often opaque and computationally expensive, making them impractical to be part of large-scale pipelines. In this work, we tackle two connected challenges: (1) reducing the cost of quality estimation at scale, and (2) developing an inexpensive uncertainty estimation method for quality estimation. To address the latter, we introduce Instant Confidence COMET, an uncertainty-aware quality estimation model that matches the performance of previous approaches at a fraction of their costs. We extend this to Early-Exit COMET, a quality estimation model that can compute quality scores and associated confidences already at early model layers, allowing us to early-exit computations and reduce evaluation costs. We also apply our model to machine translation reranking. We combine Early-Exit COMET with an upper confidence bound bandit algorithm to find the best candidate from a large pool without having to run the full evaluation model on all candidates. In both cases (evaluation and reranking) our methods reduce the required compute by 50% with very little degradation in performance. Finally, we show how Instant Confidence COMET can be used to decide which translations a human evaluator should score rather than relying on the COMET score.

View on arXiv PDF Code

Similar