SPLGSDASApr 6, 2023

To Wake-up or Not to Wake-up: Reducing Keyword False Alarm by Successive Refinement

arXiv:2304.03416v12 citationsh-index: 36
AI Analysis

This addresses the challenge of false alarms in keyword spotting for applications like voice assistants, though it is an incremental improvement over existing deep learning methods.

The paper tackles the problem of reducing false alarms in keyword spotting systems by proposing a successive refinement technique that classifies audio as speech, then keyword-like, then the specific keyword, reducing false alarms by up to a factor of 8 on in-domain data and 7 on out-of-domain data.

Keyword spotting systems continuously process audio streams to detect keywords. One of the most challenging tasks in designing such systems is to reduce False Alarm (FA) which happens when the system falsely registers a keyword despite the keyword not being uttered. In this paper, we propose a simple yet elegant solution to this problem that follows from the law of total probability. We show that existing deep keyword spotting mechanisms can be improved by Successive Refinement, where the system first classifies whether the input audio is speech or not, followed by whether the input is keyword-like or not, and finally classifies which keyword was uttered. We show across multiple models with size ranging from 13K parameters to 2.41M parameters, the successive refinement technique reduces FA by up to a factor of 8 on in-domain held-out FA data, and up to a factor of 7 on out-of-domain (OOD) FA data. Further, our proposed approach is "plug-and-play" and can be applied to any deep keyword spotting model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes