CLOct 28, 2017

Deep Residual Learning for Small-Footprint Keyword Spotting

arXiv:1710.10361v2263 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the problem of improving accuracy and efficiency in keyword spotting for speech-based interfaces, representing an incremental advancement by adapting existing techniques to a new domain.

The paper tackled keyword spotting by applying deep residual learning and dilated convolutions to the Google Speech Commands Dataset, achieving significantly higher accuracy than previous convolutional neural networks and enabling compact models that outperform small-footprint variants.

We explore the application of deep residual learning and dilated convolutions to the keyword spotting task, using the recently-released Google Speech Commands Dataset as our benchmark. Our best residual network (ResNet) implementation significantly outperforms Google's previous convolutional neural networks in terms of accuracy. By varying model depth and width, we can achieve compact models that also outperform previous small-footprint variants. To our knowledge, we are the first to examine these approaches for keyword spotting, and our results establish an open-source state-of-the-art reference to support the development of future speech-based interfaces.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes