SD AI ASAug 23, 2024

Disentangled Training with Adversarial Examples For Robust Small-footprint Keyword Spotting

Zhenyu Wang, Li Wan, Biqiao Zhang, Yiteng Huang, Shang-Wen Li, Ming Sun, Xin Lei, Zhaojun Yang

AmazonMeta AIMIT

arXiv:2408.13355v14.91 citationsh-index: 31

Originality Incremental advance

AI Analysis

This work addresses robustness in keyword spotting for on-device applications, representing an incremental improvement over existing methods.

The paper tackles the challenge of building a robust, small-footprint keyword spotting model for on-device use by proposing a datasource-aware disentangled learning method with adversarial examples, which reduces mismatches in training data. It achieves a 40.31% improvement in false reject rate at 1% false accept rate on an internal dataset and 98.06% accuracy on the Google Speech Commands V1 dataset.

A keyword spotting (KWS) engine that is continuously running on device is exposed to various speech signals that are usually unseen before. It is a challenging problem to build a small-footprint and high-performing KWS model with robustness under different acoustic environments. In this paper, we explore how to effectively apply adversarial examples to improve KWS robustness. We propose datasource-aware disentangled learning with adversarial examples to reduce the mismatch between the original and adversarial data as well as the mismatch across original training datasources. The KWS model architecture is based on depth-wise separable convolution and a simple attention module. Experimental results demonstrate that the proposed learning strategy improves false reject rate by $40.31%$ at $1%$ false accept rate on the internal dataset, compared to the strongest baseline without using adversarial examples. Our best-performing system achieves $98.06%$ accuracy on the Google Speech Commands V1 dataset.

View on arXiv PDF

Similar