AS SD SPAug 4, 2021

Blind and neural network-guided convolutional beamformer for joint denoising, dereverberation, and source separation

Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Shoko Araki

arXiv:2108.01836v14.326 citations

Originality Incremental advance

AI Analysis

This work addresses speech enhancement for applications like hearing aids or voice assistants, but it is incremental as it builds on existing joint DR and SS methods.

The paper tackled the problem of joint denoising, dereverberation, and source separation in noisy reverberant mixtures by proposing a blind and neural network-guided convolutional beamformer, which greatly outperformed conventional state-of-the-art methods in automatic speech recognition and signal distortion reduction.

This paper proposes an approach for optimizing a Convolutional BeamFormer (CBF) that can jointly perform denoising (DN), dereverberation (DR), and source separation (SS). First, we develop a blind CBF optimization algorithm that requires no prior information on the sources or the room acoustics, by extending a conventional joint DR and SS method. For making the optimization computationally tractable, we incorporate two techniques into the approach: the Source-Wise Factorization (SW-Fact) of a CBF and the Independent Vector Extraction (IVE). To further improve the performance, we develop a method that integrates a neural network(NN) based source power spectra estimation with CBF optimization by an inverse-Gamma prior. Experiments using noisy reverberant mixtures reveal that our proposed method with both blind and NN-guided scenarios greatly outperforms the conventional state-of-the-art NN-supported mask-based CBF in terms of the improvement in automatic speech recognition and signal distortion reduction performance.

View on arXiv PDF

Similar