SD ASSep 10, 2021

Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis

Sota Misawa, Norihiro Takamune, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Masakazu Une, Shoji Makino

arXiv:2109.04658v12.3

Originality Incremental advance

AI Analysis

This work addresses speech enhancement for applications like hearing aids or communication systems, but it is incremental as it builds on existing rank-constrained spatial covariance matrix estimation methods.

The paper tackles speech enhancement in diffuse noise by proposing a method that uses a deep neural network (Denoiser) to estimate both target speech and noise, and a noise self-supervised approach to improve spatial covariance matrix estimation, achieving better performance than conventional methods under various noise conditions.

Rank-constrained spatial covariance matrix estimation (RCSCME) is a method for the situation that the directional target speech and the diffuse noise are mixed. In conventional RCSCME, independent low-rank matrix analysis (ILRMA) is used as the preprocessing method. We propose RCSCME using independent deeply learned matrix analysis (IDLMA), which is a supervised extension of ILRMA. In this method, IDLMA requires deep neural networks (DNNs) to separate the target speech and the noise. We use Denoiser, which is a single-channel speech enhancement DNN, in IDLMA to estimate not only the target speech but also the noise. We also propose noise self-supervised RCSCME, in which we estimate the noise-only time intervals using the output of Denoiser and design the prior distribution of the noise spatial covariance matrix for RCSCME. We confirm that the proposed methods outperform the conventional methods under several noise conditions.

View on arXiv PDF

Similar