BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models
This work addresses the challenge of speech dereverberation in various acoustic scenarios without requiring prior knowledge of room impulse responses or paired data, offering a novel unsupervised approach that could benefit audio processing applications.
The paper tackles the problem of single-channel blind unsupervised dereverberation and room impulse response estimation using diffusion models, achieving significant performance improvements over previous blind unsupervised baselines and demonstrating increased robustness to unseen acoustic conditions compared to blind supervised methods.
In this paper, we present an unsupervised single-channel method for joint blind dereverberation and room impulse response estimation, based on posterior sampling with diffusion models. We parameterize the reverberation operator using a filter with exponential decay for each frequency subband, and iteratively estimate the corresponding parameters as the speech utterance gets refined along the reverse diffusion trajectory. A measurement consistency criterion enforces the fidelity of the generated speech with the reverberant measurement, while an unconditional diffusion model implements a strong prior for clean speech generation. Without any knowledge of the room impulse response nor any coupled reverberant-anechoic data, we can successfully perform dereverberation in various acoustic scenarios. Our method significantly outperforms previous blind unsupervised baselines, and we demonstrate its increased robustness to unseen acoustic conditions in comparison to blind supervised methods. Audio samples and code are available online.