Building Corpora for Single-Channel Speech Separation Across Multiple Domains
This work addresses the gap in speech separation research for real-world applications, though it is incremental as it focuses on dataset creation and evaluation rather than a new method.
The authors tackled the problem of single-channel speech separation by constructing synthetic overlap datasets from CHiME-5 and Mixer 6 corpora to better represent realistic applications, and they demonstrated that current methods have shortcomings in performance while showing that diverse data improves model robustness and generalization.
To date, the bulk of research on single-channel speech separation has been conducted using clean, near-field, read speech, which is not representative of many modern applications. In this work, we develop a procedure for constructing high-quality synthetic overlap datasets, necessary for most deep learning-based separation frameworks. We produced datasets that are more representative of realistic applications using the CHiME-5 and Mixer 6 corpora and evaluate standard methods on this data to demonstrate the shortcomings of current source-separation performance. We also demonstrate the value of a wide variety of data in training robust models that generalize well to multiple conditions.