AS LG SDAug 3, 2025

Test-Time Training for Speech Enhancement

Avishkar Behera, Riya Ann Easow, Venkatesh Parvathala, K. Sri Rama Murty

arXiv:2508.01847v22.31 citationsh-index: 7INTERSPEECH

Originality Incremental advance

AI Analysis

This addresses domain adaptation challenges in speech processing for real-world applications, though it is incremental as it adapts an existing method to a new domain.

The paper tackles unpredictable noise and domain shifts in speech enhancement by applying Test-Time Training with a Y-shaped architecture and self-supervised auxiliary tasks, resulting in consistent improvements in speech quality metrics over baseline models.

This paper introduces a novel application of Test-Time Training (TTT) for Speech Enhancement, addressing the challenges posed by unpredictable noise conditions and domain shifts. This method combines a main speech enhancement task with a self-supervised auxiliary task in a Y-shaped architecture. The model dynamically adapts to new domains during inference time by optimizing the proposed self-supervised tasks like noise-augmented signal reconstruction or masked spectrogram prediction, bypassing the need for labeled data. We further introduce various TTT strategies offering a trade-off between adaptation and efficiency. Evaluations across synthetic and real-world datasets show consistent improvements across speech quality metrics, outperforming the baseline model. This work highlights the effectiveness of TTT in speech enhancement, providing insights for future research in adaptive and robust speech processing.

View on arXiv PDF

Similar