SD ASOct 12, 2021

Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training

arXiv:2110.05966v28.621 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses speech separation for applications like hearing aids or communication systems, but it is incremental as it builds on existing narrow-band methods with a new training criterion.

The paper tackles multi-channel multi-speaker speech separation by proposing an end-to-end narrow-band network that uses spatial information to discriminate speakers, and it outperforms oracle beamforming and state-of-the-art deep learning methods in experiments.

This paper addresses the problem of multi-channel multi-speech separation based on deep learning techniques. In the short time Fourier transform domain, we propose an end-to-end narrow-band network that directly takes as input the multi-channel mixture signals of one frequency, and outputs the separated signals of this frequency. In narrow-band, the spatial information (or inter-channel difference) can well discriminate between speakers at different positions. This information is intensively used in many narrow-band speech separation methods, such as beamforming and clustering of spatial vectors. The proposed network is trained to learn a rule to automatically exploit this information and perform speech separation. Such a rule should be valid for any frequency, thence the network is shared by all frequencies. In addition, a full-band permutation invariant training criterion is proposed to solve the frequency permutation problem encountered by most narrow-band methods. Experiments show that, by focusing on deeply learning the narrow-band information, the proposed method outperforms the oracle beamforming method and the state-of-the-art deep learning based method.

View on arXiv PDF Code

Similar