CLAILGFeb 18, 2025

Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral

arXiv:2502.12701v16 citationsh-index: 20EMNLP
Originality Incremental advance
AI Analysis

This provides a practical solution for efficient machine translation deployment, though it is incremental as it applies existing methods in a new configuration.

The paper tackles the computational cost of large machine translation models by proposing a cascaded system that uses quality estimation metrics to defer only 30% to 50% of examples to larger models, matching their performance while reducing costs.

Larger models often outperform smaller ones but come with high computational costs. Cascading offers a potential solution. By default, it uses smaller models and defers only some instances to larger, more powerful models. However, designing effective deferral rules remains a challenge. In this paper, we propose a simple yet effective approach for machine translation, using existing quality estimation (QE) metrics as deferral rules. We show that QE-based deferral allows a cascaded system to match the performance of a larger model while invoking it for a small fraction (30% to 50%) of the examples, significantly reducing computational costs. We validate this approach through both automatic and human evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes