GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples
This incremental improvement addresses a specific bottleneck in keyword spotting models for audio processing applications.
The paper tackled the scarcity of boundary examples in spoken keyword spotting by generating adversarial examples through grapheme edits, resulting in a 61% improvement in AUC on synthetic hard negatives while maintaining performance on other data.
Spoken Keyword Spotting (KWS) is the task of distinguishing between the presence and absence of a keyword in audio. The accuracy of a KWS model hinges on its ability to correctly classify examples close to the keyword and non-keyword boundary. These boundary examples are often scarce in training data, limiting model performance. In this paper, we propose a method to systematically generate adversarial examples close to the decision boundary by making insertion/deletion/substitution edits on the keyword's graphemes. We evaluate this technique on held-out data for a popular keyword and show that the technique improves AUC on a dataset of synthetic hard negatives by 61% while maintaining quality on positives and ambient negative audio data.