SDAILGASApr 18, 2021

Many-Speakers Single Channel Speech Separation with Optimal Permutation Training

arXiv:2104.08955v425 citations
Originality Highly original
AI Analysis

This work addresses a bottleneck in speech separation for applications requiring separation of many speakers, offering a more scalable solution compared to existing methods.

The paper tackles the problem of single-channel speech separation for many speakers (up to 20), which was previously limited by high computational complexity, and achieves improved results by introducing a permutation invariant training method with O(C^3) time complexity and a modified architecture.

Single channel speech separation has experienced great progress in the last few years. However, training neural speech separation for a large number of speakers (e.g., more than 10 speakers) is out of reach for the current methods, which rely on the Permutation Invariant Loss (PIT). In this work, we present a permutation invariant training that employs the Hungarian algorithm in order to train with an $O(C^3)$ time complexity, where $C$ is the number of speakers, in comparison to $O(C!)$ of PIT based methods. Furthermore, we present a modified architecture that can handle the increased number of speakers. Our approach separates up to $20$ speakers and improves the previous results for large $C$ by a wide margin.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes