ASLGSDNov 11, 2022

Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting

arXiv:2211.06478v14 citationsh-index: 20
Originality Incremental advance
AI Analysis

This provides an incremental improvement for keyword spotting tasks in speech recognition, enhancing flexibility in training.

The paper tackles keyword spotting by adapting a sequence-to-sequence Transformer-Transducer ASR system to detect keywords using a special token and a decision function, achieving similar performance to conventional systems while offering more flexibility.

In this paper, we present a novel approach to adapt a sequence-to-sequence Transformer-Transducer ASR system to the keyword spotting (KWS) task. We achieve this by replacing the keyword in the text transcription with a special token <kw> and training the system to detect the <kw> token in an audio stream. At inference time, we create a decision function inspired by conventional KWS approaches, to make our approach more suitable for the KWS task. Furthermore, we introduce a specific keyword spotting loss by adapting the sequence-discriminative Minimum Bayes-Risk training technique. We find that our approach significantly outperforms ASR based KWS systems. When compared with a conventional keyword spotting system, our proposal has similar performance while bringing the advantages and flexibility of sequence-to-sequence training. Additionally, when combined with the conventional KWS system, our approach can improve the performance at any operation point.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes