ASSDJul 22, 2021

Controlling the Perceived Sound Quality for Dialogue Enhancement with Deep Learning

arXiv:2107.10562v13 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of maintaining consistent sound quality in dialogue enhancement systems, which is incremental as it builds on existing speech enhancement techniques.

The paper tackles the problem of balancing background noise attenuation and sound quality in speech enhancement by proposing a deep learning method that controls this trade-off to meet adjustable quality targets, achieving adequate accuracy for real-world dialogue applications.

Speech enhancement attenuates interfering sounds in speech signals but may introduce artifacts that perceivably deteriorate the output signal. We propose a method for controlling the trade-off between the attenuation of the interfering background signal and the loss of sound quality. A deep neural network estimates the attenuation of the separated background signal such that the sound quality, quantified using the Artifact-related Perceptual Score, meets an adjustable target. Subjective evaluations indicate that consistent sound quality is obtained across various input signals. Our experiments show that the proposed method is able to control the trade-off with an accuracy that is adequate for real-world dialogue enhancement applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes