SDAIASApr 26, 2025

Improving Pretrained YAMNet for Enhanced Speech Command Detection via Transfer Learning

arXiv:2504.19030v15 citationsh-index: 9ICTIS
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for speech command detection in smart applications.

This work tackled the problem of improving speech command recognition by adapting the pretrained YAMNet model using transfer learning, achieving a recognition accuracy of 95.28% on the Speech Commands dataset.

This work addresses the need for enhanced accuracy and efficiency in speech command recognition systems, a critical component for improving user interaction in various smart applications. Leveraging the robust pretrained YAMNet model and transfer learning, this study develops a method that significantly improves speech command recognition. We adapt and train a YAMNet deep learning model to effectively detect and interpret speech commands from audio signals. Using the extensively annotated Speech Commands dataset (speech_commands_v0.01), our approach demonstrates the practical application of transfer learning to accurately recognize a predefined set of speech commands. The dataset is meticulously augmented, and features are strategically extracted to boost model performance. As a result, the final model achieved a recognition accuracy of 95.28%, underscoring the impact of advanced machine learning techniques on speech command recognition. This achievement marks substantial progress in audio processing technologies and establishes a new benchmark for future research in the field.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes