ASAISDSep 26, 2025

Unsupervised Speech Enhancement using Data-defined Priors

arXiv:2509.22942v11 citationsh-index: 63
Originality Incremental advance
AI Analysis

This addresses the challenge of collecting real-world paired data for speech enhancement, though it is incremental as it builds on prior unsupervised approaches.

The paper tackles the problem of unsupervised speech enhancement without paired clean-noisy data by proposing a dual-branch encoder-decoder architecture with adversarial training using unpaired datasets, achieving performance comparable to leading unsupervised methods and highlighting the impact of clean speech data selection on results.

The majority of deep learning-based speech enhancement methods require paired clean-noisy speech data. Collecting such data at scale in real-world conditions is infeasible, which has led the community to rely on synthetically generated noisy speech. However, this introduces a gap between the training and testing phases. In this work, we propose a novel dual-branch encoder-decoder architecture for unsupervised speech enhancement that separates the input into clean speech and residual noise. Adversarial training is employed to impose priors on each branch, defined by unpaired datasets of clean speech and, optionally, noise. Experimental results show that our method achieves performance comparable to leading unsupervised speech enhancement approaches. Furthermore, we demonstrate the critical impact of clean speech data selection on enhancement performance. In particular, our findings reveal that performance may appear overly optimistic when in-domain clean speech data are used for prior definition -- a practice adopted in previous unsupervised speech enhancement studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes