ASLGSDNov 4, 2022

Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration

arXiv:2211.02397v257 citationsh-index: 34
Originality Incremental advance
AI Analysis

This work addresses speech restoration for audio processing applications, showing incremental improvements by extending prior diffusion-based methods to new tasks.

The paper systematically compares generative diffusion models and discriminative approaches for speech restoration tasks, finding that the generative method performs better across all tasks, especially for non-additive distortions like dereverberation and bandwidth extension.

Diffusion-based generative models have had a high impact on the computer vision and speech processing communities these past years. Besides data generation tasks, they have also been employed for data restoration tasks like speech enhancement and dereverberation. While discriminative models have traditionally been argued to be more powerful e.g. for speech enhancement, generative diffusion approaches have recently been shown to narrow this performance gap considerably. In this paper, we systematically compare the performance of generative diffusion models and discriminative approaches on different speech restoration tasks. For this, we extend our prior contributions on diffusion-based speech enhancement in the complex time-frequency domain to the task of bandwith extension. We then compare it to a discriminatively trained neural network with the same network architecture on three restoration tasks, namely speech denoising, dereverberation and bandwidth extension. We observe that the generative approach performs globally better than its discriminative counterpart on all tasks, with the strongest benefit for non-additive distortion models, like in dereverberation and bandwidth extension. Code and audio examples can be found online at https://uhh.de/inf-sp-sgmsemultitask

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes