SDASOct 12, 2021

Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training

arXiv:2110.05966v221 citations
Originality Incremental advance
AI Analysis

This work addresses speech separation for applications like hearing aids or communication systems, but it is incremental as it builds on existing narrow-band methods with a new training criterion.

The paper tackles multi-channel multi-speaker speech separation by proposing an end-to-end narrow-band network that uses spatial information to discriminate speakers, and it outperforms oracle beamforming and state-of-the-art deep learning methods in experiments.

This paper addresses the problem of multi-channel multi-speech separation based on deep learning techniques. In the short time Fourier transform domain, we propose an end-to-end narrow-band network that directly takes as input the multi-channel mixture signals of one frequency, and outputs the separated signals of this frequency. In narrow-band, the spatial information (or inter-channel difference) can well discriminate between speakers at different positions. This information is intensively used in many narrow-band speech separation methods, such as beamforming and clustering of spatial vectors. The proposed network is trained to learn a rule to automatically exploit this information and perform speech separation. Such a rule should be valid for any frequency, thence the network is shared by all frequencies. In addition, a full-band permutation invariant training criterion is proposed to solve the frequency permutation problem encountered by most narrow-band methods. Experiments show that, by focusing on deeply learning the narrow-band information, the proposed method outperforms the oracle beamforming method and the state-of-the-art deep learning based method.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes