SDAILGASJun 11, 2024

Pre-training Feature Guided Diffusion Model for Speech Enhancement

arXiv:2406.07646v110 citations
Originality Incremental advance
AI Analysis

This work addresses speech clarity and intelligibility for communication applications, representing an incremental improvement over existing models.

The paper tackles speech enhancement in noisy environments by introducing a pre-training feature-guided diffusion model that integrates spectral features into a VAE and uses DDIM for efficient sampling, achieving state-of-the-art results on two public datasets with improved efficiency and robustness.

Speech enhancement significantly improves the clarity and intelligibility of speech in noisy environments, improving communication and listening experiences. In this paper, we introduce a novel pretraining feature-guided diffusion model tailored for efficient speech enhancement, addressing the limitations of existing discriminative and generative models. By integrating spectral features into a variational autoencoder (VAE) and leveraging pre-trained features for guidance during the reverse process, coupled with the utilization of the deterministic discrete integration method (DDIM) to streamline sampling steps, our model improves efficiency and speech enhancement quality. Demonstrating state-of-the-art results on two public datasets with different SNRs, our model outshines other baselines in efficiency and robustness. The proposed method not only optimizes performance but also enhances practical deployment capabilities, without increasing computational demands.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes