AS SD SPDec 17, 2020

Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording

Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen

arXiv:2012.09727v28.08 citations

Originality Incremental advance

AI Analysis

This work provides an incremental improvement for speech separation systems, particularly for scenarios involving long multi-talker conversations where pre-enrolled speaker signals are unavailable.

This paper addresses continuous speech separation in long multi-talker recordings by proposing a self-informed, clustering-based inventory forming scheme. This method eliminates the need for external speaker signals by building the speaker inventory entirely from the input signal, leading to significant improvements in separation performance on simulated noisy reverberant long recording datasets.

Leveraging additional speaker information to facilitate speech separation has received increasing attention in recent years. Recent research includes extracting target speech by using the target speaker's voice snippet and jointly separating all participating speakers by using a pool of additional speaker signals, which is known as speech separation using speaker inventory (SSUSI). However, all these systems ideally assume that the pre-enrolled speaker signals are available and are only evaluated on simple data configurations. In realistic multi-talker conversations, the speech signal contains a large proportion of non-overlapped regions, where we can derive robust speaker embedding of individual talkers. In this work, we adopt the SSUSI model in long recordings and propose a self-informed, clustering-based inventory forming scheme for long recording, where the speaker inventory is fully built from the input signal without the need for external speaker signals. Experiment results on simulated noisy reverberant long recording datasets show that the proposed method can significantly improve the separation performance across various conditions.

View on arXiv PDF

Similar