SDASJan 15, 2022

ConvMixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-field Keyword Spotting

arXiv:2201.05863v161 citations
Originality Incremental advance
AI Analysis

This addresses the problem of efficient keyword spotting deployment in noisy real-world environments for users, though it appears incremental as it builds on existing lightweight architectures with specific enhancements.

The paper tackled the challenge of building a lightweight and noise-robust keyword spotting model for noisy far-field conditions, achieving 98.2% top-1 accuracy on Google Speech Command V2-12 with only 100K parameters.

Building efficient architecture in neural speech processing is paramount to success in keyword spotting deployment. However, it is very challenging for lightweight models to achieve noise robustness with concise neural operations. In a real-world application, the user environment is typically noisy and may also contain reverberations. We proposed a novel feature interactive convolutional model with merely 100K parameters to tackle this under the noisy far-field condition. The interactive unit is proposed in place of the attention module that promotes the flow of information with more efficient computations. Moreover, curriculum-based multi-condition training is adopted to attain better noise robustness. Our model achieves 98.2% top-1 accuracy on Google Speech Command V2-12 and is competitive against large transformer models under the designed noise condition.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes