ASAILGMay 5

Rethinking Entropy Minimization in Test-Time Adaptation for Autoregressive Models

arXiv:2605.0818650.8
AI Analysis

Provides a principled framework for test-time adaptation of autoregressive models, addressing a theoretical gap for practitioners working with generative models.

The paper derives a unified mathematical formulation of entropy minimization for test-time adaptation of autoregressive models, decomposing it into token-level policy gradient and entropy losses. Applied to Whisper ASR, it consistently improves performance across 20+ diverse domains.

Test-Time Adaptation (TTA) via entropy minimization (EM) has proven effective for classification tasks, yet its application to generative autoregressive models remains theoretically fragmented. Existing approaches typically rely on distinct heuristics, such as teacher forcing with pseudo labels or policy-gradient-based reinforcement learning, without a unified mathematical foundation. In this work, we resolve this discrepancy by deriving a rigorous formulation of EM tailored to autoregressive models. We show that the exact objective naturally decomposes into a token-level policy gradient loss and a token-level entropy loss, and we reinterpret prior methods as partial realizations of this unified formulation. Using Whisper ASR as a testbed, we demonstrate that our approach consistently improves performance across more than 20 diverse domains, including acoustic noise, accents, and multilingual settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes