ASCLLGJun 23, 2022

QbyE-MLPMixer: Query-by-Example Open-Vocabulary Keyword Spotting using MLPMixer

arXiv:2206.13231v111 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the problem of personalizing smart device interaction by enabling keyword recognition without pre-defined keywords, though it is incremental as it adapts an existing MLP architecture to a specific task.

The paper tackles open-vocabulary keyword spotting by proposing a pure MLP-based neural network based on MLPMixer, achieving better performance than state-of-the-art RNN and CNN models in challenging 10dB and 6dB environments on datasets including Hey-Snips and an internal dataset with 400 speakers, with fewer parameters and MACs.

Current keyword spotting systems are typically trained with a large amount of pre-defined keywords. Recognizing keywords in an open-vocabulary setting is essential for personalizing smart device interaction. Towards this goal, we propose a pure MLP-based neural network that is based on MLPMixer - an MLP model architecture that effectively replaces the attention mechanism in Vision Transformers. We investigate different ways of adapting the MLPMixer architecture to the QbyE open-vocabulary keyword spotting task. Comparisons with the state-of-the-art RNN and CNN models show that our method achieves better performance in challenging situations (10dB and 6dB environments) on both the publicly available Hey-Snips dataset and a larger scale internal dataset with 400 speakers. Our proposed model also has a smaller number of parameters and MACs compared to the baseline models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes