SDCLASMay 20, 2025

GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples

arXiv:2505.14814v22 citationsh-index: 4INTERSPEECH
Originality Incremental advance
AI Analysis

This incremental improvement addresses a specific bottleneck in keyword spotting models for audio processing applications.

The paper tackled the scarcity of boundary examples in spoken keyword spotting by generating adversarial examples through grapheme edits, resulting in a 61% improvement in AUC on synthetic hard negatives while maintaining performance on other data.

Spoken Keyword Spotting (KWS) is the task of distinguishing between the presence and absence of a keyword in audio. The accuracy of a KWS model hinges on its ability to correctly classify examples close to the keyword and non-keyword boundary. These boundary examples are often scarce in training data, limiting model performance. In this paper, we propose a method to systematically generate adversarial examples close to the decision boundary by making insertion/deletion/substitution edits on the keyword's graphemes. We evaluate this technique on held-out data for a popular keyword and show that the technique improves AUC on a dataset of synthetic hard negatives by 61% while maintaining quality on positives and ambient negative audio data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes