ASAILGSDSep 18, 2023

Single and Few-step Diffusion for Generative Speech Enhancement

arXiv:2309.09677v225 citationsh-index: 34
Originality Incremental advance
AI Analysis

This work addresses efficiency and error issues in generative speech enhancement, offering a method that maintains performance with fewer steps, which is incremental but impactful for real-time applications.

The paper tackles the slow inference and discretization errors in diffusion models for speech enhancement by proposing a two-stage training approach, achieving the same performance as a baseline using only 5 function evaluations instead of 60.

Diffusion models have shown promising results in speech enhancement, using a task-adapted diffusion process for the conditional generation of clean speech given a noisy mixture. However, at test time, the neural network used for score estimation is called multiple times to solve the iterative reverse process. This results in a slow inference process and causes discretization errors that accumulate over the sampling trajectory. In this paper, we address these limitations through a two-stage training approach. In the first stage, we train the diffusion model the usual way using the generative denoising score matching loss. In the second stage, we compute the enhanced signal by solving the reverse process and compare the resulting estimate to the clean speech target using a predictive loss. We show that using this second training stage enables achieving the same performance as the baseline model using only 5 function evaluations instead of 60 function evaluations. While the performance of usual generative diffusion algorithms drops dramatically when lowering the number of function evaluations (NFEs) to obtain single-step diffusion, we show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting and also generalizes better than its predictive counterpart.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes