LGAISDMar 9, 2015

Modeling State-Conditional Observation Distribution using Weighted Stereo Samples for Factorial Speech Processing Models

arXiv:1503.02578v22 citations
AI Analysis

This work addresses noise robustness in speech recognition, offering an incremental improvement for handling non-stationary noises in systems like Aurora 2.

The paper tackles noise-robust automatic speech recognition by proposing a method to model state-conditional observation distributions using weighted stereo samples for factorial models, achieving up to 4% absolute improvement in word recognition accuracy in low signal-to-noise conditions.

This paper investigates the effectiveness of factorial speech processing models in noise-robust automatic speech recognition tasks. For this purpose, the paper proposes an idealistic approach for modeling state-conditional observation distribution of factorial models based on weighted stereo samples. This approach is an extension to previous single pass retraining for ideal model compensation which is extended here to support multiple audio sources. Non-stationary noises can be considered as one of these audio sources with multiple states. Experiments of this paper over the set A of the Aurora 2 dataset show that recognition performance can be improved by this consideration. The improvement is significant in low signal to noise energy conditions, up to 4% absolute word recognition accuracy. In addition to the power of the proposed method in accurate representation of state-conditional observation distribution, it has an important advantage over previous methods by providing the opportunity to independently select feature spaces for both source and corrupted features. This opens a new window for seeking better feature spaces appropriate for noisy speech, independent from clean speech features.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes