LGAICLMay 9, 2023

Who Needs Decoders? Efficient Estimation of Sequence-level Attributes

arXiv:2305.05098v12 citations
Originality Highly original
AI Analysis

This work addresses efficiency issues for downstream tasks like out-of-distribution detection and resource allocation in machine translation and speech recognition, offering a novel method to bypass costly decoding.

The paper tackles the problem of expensive autoregressive decoding in sequence-to-sequence models by proposing Non-Autoregressive Proxy (NAP) models that efficiently predict scalar sequence-level attributes directly from encodings, achieving faster performance than deep ensembles in out-of-distribution detection for machine translation.

State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding? We propose Non-Autoregressive Proxy (NAP) models that can efficiently predict general scalar-valued sequence-level attributes. Importantly, NAPs predict these metrics directly from the encodings, avoiding the expensive autoregressive decoding stage. We consider two sequence-to-sequence task: Machine Translation (MT); and Automatic Speech Recognition (ASR). In OOD for MT, NAPs outperform a deep ensemble while being significantly faster. NAPs are also shown to be able to predict performance metrics such as BERTScore (MT) or word error rate (ASR). For downstream tasks, such as data filtering and resource optimization, NAPs generate performance predictions that outperform predictive uncertainty while being highly inference efficient.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes