LG SD MLMar 4, 2013

Denoising Deep Neural Networks Based Voice Activity Detection

arXiv:1303.0663v155 citations

Originality Incremental advance

AI Analysis

This is an incremental improvement for voice activity detection in speech processing.

The paper tackles the problem that deep layers in DBN-based voice activity detection (VAD) do not show clear superiority over shallower layers by proposing a denoising deep neural network (DDNN) based VAD, which outperforms DBN-based VAD and shows performance improvement in deep layers.

Recently, the deep-belief-networks (DBN) based voice activity detection (VAD) has been proposed. It is powerful in fusing the advantages of multiple features, and achieves the state-of-the-art performance. However, the deep layers of the DBN-based VAD do not show an apparent superiority to the shallower layers. In this paper, we propose a denoising-deep-neural-network (DDNN) based VAD to address the aforementioned problem. Specifically, we pre-train a deep neural network in a special unsupervised denoising greedy layer-wise mode, and then fine-tune the whole network in a supervised way by the common back-propagation algorithm. In the pre-training phase, we take the noisy speech signals as the visible layer and try to extract a new feature that minimizes the reconstruction cross-entropy loss between the noisy speech signals and its corresponding clean speech signals. Experimental results show that the proposed DDNN-based VAD not only outperforms the DBN-based VAD but also shows an apparent performance improvement of the deep layers over shallower layers.

View on arXiv PDF

Similar