SD ASJan 22, 2019

Speech Separation Using Gain-Adapted Factorial Hidden Markov Models

Martin H. Radfar, Richard M. Dansereau, Willy Wong

arXiv:1901.07604v11.4

Originality Incremental advance

AI Analysis

This work addresses a practical limitation in speech separation for applications like hearing aids or audio processing, but it is incremental as it extends an existing model to handle gain variations.

The paper tackles the problem of single-channel speech separation when training and test data have different loudness levels, introducing a gain-adapted factorial hidden Markov model (GFHMM) that handles unknown gain factors, and experimental results on 180 mixtures show it significantly outperforms existing methods like FHMM and VQ-based SCSS.

We present a new probabilistic graphical model which generalizes factorial hidden Markov models (FHMM) for the problem of single-channel speech separation (SCSS) in which we wish to separate the two speech signals $X(t)$ and $V(t)$ from a single recording of their mixture $Y(t)=X(t)+V(t)$ using the trained models of the speakers' speech signals. Current techniques assume the data used in the training and test phases of the separation model have the same loudness. In this paper, we introduce GFHMM, gain adapted FHMM, to extend SCSS to the general case in which $Y(t)=g_xX(t)+g_vV(t)$, where $g_x$ and $g_v$ are unknown gain factors. GFHMM consists of two independent-state HMMs and a hidden node which model spectral patterns and gain difference, respectively. A novel inference method is presented using the Viterbi algorithm and quadratic optimization with minimal computational overhead. Experimental results, conducted on 180 mixtures with gain differences from 0 to 15~dB, show that the proposed technique significantly outperforms FHMM and its memoryless counterpart, i.e., vector quantization (VQ)-based SCSS.

View on arXiv PDF

Similar