Noisy student-teacher training for robust keyword spotting
This work addresses robustness in keyword spotting for applications like voice assistants, but it is incremental as it builds on existing self-training and data augmentation techniques.
The paper tackled the problem of improving keyword spotting accuracy under challenging conditions by proposing a self-training method with noisy student-teacher training that utilizes large-scale unlabeled data and aggressive data augmentation, resulting in accuracy improvements of up to 60% on difficult-conditioned test sets.
We propose self-training with noisy student-teacher approach for streaming keyword spotting, that can utilize large-scale unlabeled data and aggressive data augmentation. The proposed method applies aggressive data augmentation (spectral augmentation) on the input of both student and teacher and utilize unlabeled data at scale, which significantly boosts the accuracy of student against challenging conditions. Such aggressive augmentation usually degrades model performance when used with supervised training with hard-labeled data. Experiments show that aggressive spec augmentation on baseline supervised training method degrades accuracy, while the proposed self-training with noisy student-teacher training improves accuracy of some difficult-conditioned test sets by as much as 60%.