Conditional Online Learning for Keyword Spotting
This work addresses the issue of model degradation in real-life applications for commercial keyword spotting systems, though it is incremental as it focuses on learning the same task with conditional updates.
The paper tackles the problem of keyword spotting models underperforming in changing data regimes by proposing an online continual learning method that updates models on-device via SGD, improving a pre-trained model's performance by 34% in dynamic audio streams.
Modern approaches for keyword spotting rely on training deep neural networks on large static datasets with i.i.d. distributions. However, the resulting models tend to underperform when presented with changing data regimes in real-life applications. This work investigates a simple but effective online continual learning method that updates a keyword spotter on-device via SGD as new data becomes available. Contrary to previous research, this work focuses on learning the same KWS task, which covers most commercial applications. During experiments with dynamic audio streams in different scenarios, that method improves the performance of a pre-trained small-footprint model by 34%. Moreover, experiments demonstrate that, compared to a naive online learning implementation, conditional model updates based on its performance in a small hold-out set drawn from the training distribution mitigate catastrophic forgetting.