ASLGJul 6, 2022

Low-resource Low-footprint Wake-word Detection using Knowledge Distillation

arXiv:2207.03331v16 citationsh-index: 16Has Code
Originality Incremental advance
AI Analysis

This addresses the need for low-resource, low-footprint wake-word detection for diverse virtual assistants, offering an incremental improvement over existing methods.

The paper tackled the problem of training wake-word detectors without costly wake-word-specific datasets by leveraging acoustic modeling data through transfer learning and knowledge distillation, achieving improved accuracy and reduced latency on both the 'Hey Snips' and an in-house far-field dataset.

As virtual assistants have become more diverse and specialized, so has the demand for application or brand-specific wake words. However, the wake-word-specific datasets typically used to train wake-word detectors are costly to create. In this paper, we explore two techniques to leverage acoustic modeling data for large-vocabulary speech recognition to improve a purpose-built wake-word detector: transfer learning and knowledge distillation. We also explore how these techniques interact with time-synchronous training targets to improve detection latency. Experiments are presented on the open-source "Hey Snips" dataset and a more challenging in-house far-field dataset. Using phone-synchronous targets and knowledge distillation from a large acoustic model, we are able to improve accuracy across dataset sizes for both datasets while reducing latency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes