CLLGFeb 14, 2021

Query-by-Example Keyword Spotting system using Multi-head Attention and Softtriple Loss

arXiv:2102.07061v255 citations
Originality Incremental advance
AI Analysis

This work addresses keyword spotting for user-defined queries, but it is incremental as it builds on existing methods with hybrid components.

The paper tackles the query-by-example keyword spotting task by proposing a neural network with multi-head attention and softtriple loss, achieving solid performance on internal and public datasets compared to a baseline.

This paper proposes a neural network architecture for tackling the query-by-example user-defined keyword spotting task. A multi-head attention module is added on top of a multi-layered GRU for effective feature extraction, and a normalized multi-head attention module is proposed for feature aggregation. We also adopt the softtriple loss - a combination of triplet loss and softmax loss - and showcase its effectiveness. We demonstrate the performance of our model on internal datasets with different languages and the public Hey-Snips dataset. We compare the performance of our model to a baseline system and conduct an ablation study to show the benefit of each component in our architecture. The proposed work shows solid performance while preserving simplicity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes