ASLGSDFeb 1, 2024

An Analysis of the Variance of Diffusion-based Speech Enhancement

arXiv:2402.00811v28 citationsh-index: 9INTERSPEECH
AI Analysis

This work addresses the tradeoff between noise reduction and speech quality in speech enhancement systems, but it is incremental as it builds on existing SGMSE+ approaches.

The paper identifies the variance scale in diffusion-based speech enhancement as a key parameter affecting performance, showing that larger variance improves noise attenuation and reduces computational cost by requiring fewer function evaluations.

Diffusion models proved to be powerful models for generative speech enhancement. In recent SGMSE+ approaches, training involves a stochastic differential equation for the diffusion process, adding both Gaussian and environmental noise to the clean speech signal gradually. The speech enhancement performance varies depending on the choice of the stochastic differential equation that controls the evolution of the mean and the variance along the diffusion processes when adding environmental and Gaussian noise. In this work, we highlight that the scale of the variance is a dominant parameter for speech enhancement performance and show that it controls the tradeoff between noise attenuation and speech distortions. More concretely, we show that a larger variance increases the noise attenuation and allows for reducing the computational footprint, as fewer function evaluations for generating the estimate are required

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes