SDASJun 10, 2021

A Comparison and Combination of Unsupervised Blind Source Separation Techniques

arXiv:2106.05627v112 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of improving speech separation and recognition performance for applications in noisy environments, but it is incremental as it combines existing techniques.

The paper compared unsupervised blind source separation techniques, specifically spatial mixture models and independent vector analysis (IVA), on a reverberant dataset, and introduced a serial concatenation that significantly improved word error rate (WER) performance, approaching that of a more complex neural network method.

Unsupervised blind source separation methods do not require a training phase and thus cannot suffer from a train-test mismatch, which is a common concern in neural network based source separation. The unsupervised techniques can be categorized in two classes, those building upon the sparsity of speech in the Short-Time Fourier transform domain and those exploiting non-Gaussianity or non-stationarity of the source signals. In this contribution, spatial mixture models which fall in the first category and independent vector analysis (IVA) as a representative of the second category are compared w.r.t. their separation performance and the performance of a downstream speech recognizer on a reverberant dataset of reasonable size. Furthermore, we introduce a serial concatenation of the two, where the result of the mixture model serves as initialization of IVA, which achieves significantly better WER performance than each algorithm individually and even approaches the performance of a much more complex neural network based technique.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes