SDLGASMLFeb 5, 2019

An Ensemble SVM-based Approach for Voice Activity Detection

arXiv:1902.01544v19 citations
Originality Incremental advance
AI Analysis

This work addresses the need for low-complexity, high-accuracy VAD for speech processing applications, but it is incremental as it builds on existing SVM methods.

The paper tackles the problem of voice activity detection (VAD) by proposing an ensemble SVM-based approach to reduce training complexity and improve accuracy, achieving 88.74% accuracy compared to 86.28% for a neural network and 57.05% for a stand-alone SVM.

Voice activity detection (VAD), used as the front end of speech enhancement, speech and speaker recognition algorithms, determines the overall accuracy and efficiency of the algorithms. Therefore, a VAD with low complexity and high accuracy is highly desirable for speech processing applications. In this paper, we propose a novel training method on large dataset for supervised learning-based VAD system using support vector machine (SVM). Despite of high classification accuracy of support vector machines (SVM), trivial SVM is not suitable for classification of large data sets needed for a good VAD system because of high training complexity. To overcome this problem, a novel ensemble-based approach using SVM has been proposed in this paper.The performance of the proposed ensemble structure has been compared with a feedforward neural network (NN). Although NN performs better than single SVM-based VAD trained on a small portion of the training data, ensemble SVM gives accuracy comparable to neural network-based VAD. Ensemble SVM and NN give 88.74% and 86.28% accuracy respectively whereas the stand-alone SVM shows 57.05% accuracy on average on the test dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes