SD LG MLNov 1, 2016

Enhanced Factored Three-Way Restricted Boltzmann Machines for Speech Detection

arXiv:1611.00326v37.46 citations

Originality Incremental advance

AI Analysis

This work addresses speech detection for applications in noisy environments, representing an incremental improvement over prior methods.

The authors tackled speech detection in noisy environments by proposing an enhanced factored three-way restricted Boltzmann machine (EFTW-RBM) that incorporates conditional feature learning and a factored low-rank approximation, resulting in improved performance as measured by AUC and SDR metrics compared to existing 1D and 2D algorithms.

In this letter, we propose enhanced factored three way restricted Boltzmann machines (EFTW-RBMs) for speech detection. The proposed model incorporates conditional feature learning by multiplying the dynamical state of the third unit, which allows a modulation over the visible-hidden node pairs. Instead of stacking previous frames of speech as the third unit in a recursive manner, the correlation related weighting coefficients are assigned to the contextual neighboring frames. Specifically, a threshold function is designed to capture the long-term features and blend the globally stored speech structure. A factored low rank approximation is introduced to reduce the parameters of the three-dimensional interaction tensor, on which non-negative constraint is imposed to address the sparsity characteristic. The validations through the area-under-ROC-curve (AUC) and signal distortion ratio (SDR) show that our approach outperforms several existing 1D and 2D (i.e., time and time-frequency domain) speech detection algorithms in various noisy environments.

View on arXiv PDF

Similar