SD AI ASAug 4, 2025

Adaptive Knowledge Distillation for Device-Directed Speech Detection

Hyung Gun Chi, Florian Pesce, Wonil Chang, Oggi Rudovic, Arturo Argueta, Stefan Braun, Vineet Garg, Ahmed Hussen Abdelaziz

arXiv:2508.02801v13 citationsh-index: 6INTERSPEECH

Originality Incremental advance

AI Analysis

This work addresses device-directed speech detection for voice assistants, offering incremental improvements in accuracy.

The paper tackled the problem of distinguishing user queries from background speech in voice assistants by proposing an adaptive knowledge distillation method, which improved accuracy by +26% for keyword invocations and +19% for keyword-free invocations in terms of Equal Error Rate.

Device-directed speech detection (DDSD) is a binary classification task that separates the user's queries to a voice assistant (VA) from background speech or side conversations. This is important for achieving naturalistic user experience. To this end, we propose knowledge distillation (KD) to enhance DDSD accuracy while ensuring efficient deployment. Specifically, we introduce a novel adaptive KD method that transfers knowledge from general representations of an ASR large pre-trained acoustic encoder (teacher). We apply task-specific adapters, on top of the (frozen) teacher encoder, trained jointly with the student model on DDSD. We demonstrate that the proposed adaptive KD outperforms the student model without distillation in the keyword and keyword-free (follow-up) invocations, with an improvement of +26% and +19% in terms of Equal Error Rate, respectively. We also show that this approach generalizes across the transformer and conformer-based model architectures.

View on arXiv PDF

Similar