SDCLASFeb 23, 2023

Metric-oriented Speech Enhancement using Diffusion Probabilistic Model

arXiv:2302.11989v122 citationsh-index: 44
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in speech enhancement for audio processing applications, offering an incremental improvement by aligning training with evaluation metrics.

The paper tackles the mismatch between non-differentiable evaluation metrics (e.g., PESQ) and training objectives in speech enhancement by proposing a metric-oriented method (MOSE) that integrates a metric-oriented training strategy into a diffusion probabilistic model, resulting in improved performance over generative baselines across all metrics.

Deep neural network based speech enhancement technique focuses on learning a noisy-to-clean transformation supervised by paired training data. However, the task-specific evaluation metric (e.g., PESQ) is usually non-differentiable and can not be directly constructed in the training criteria. This mismatch between the training objective and evaluation metric likely results in sub-optimal performance. To alleviate it, we propose a metric-oriented speech enhancement method (MOSE), which leverages the recent advances in the diffusion probabilistic model and integrates a metric-oriented training strategy into its reverse process. Specifically, we design an actor-critic based framework that considers the evaluation metric as a posterior reward, thus guiding the reverse process to the metric-increasing direction. The experimental results demonstrate that MOSE obviously benefits from metric-oriented training and surpasses the generative baselines in terms of all evaluation metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes