Keyword spotting -- Detecting commands in speech using deep learning
This work addresses the problem of detecting commands in speech for applications like voice assistants, but it is incremental as it applies existing methods to this task.
The paper tackled keyword spotting in speech by experimenting with various deep learning models, achieving a best accuracy of 93.9% using an RNN with BiLSTM and Attention.
Speech recognition has become an important task in the development of machine learning and artificial intelligence. In this study, we explore the important task of keyword spotting using speech recognition machine learning and deep learning techniques. We implement feature engineering by converting raw waveforms to Mel Frequency Cepstral Coefficients (MFCCs), which we use as inputs to our models. We experiment with several different algorithms such as Hidden Markov Model with Gaussian Mixture, Convolutional Neural Networks and variants of Recurrent Neural Networks including Long Short-Term Memory and the Attention mechanism. In our experiments, RNN with BiLSTM and Attention achieves the best performance with an accuracy of 93.9 %