CLApr 14, 2018

Developing Far-Field Speaker System Via Teacher-Student Learning

Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong

arXiv:1804.05166v14.056 citations

Originality Synthesis-oriented

AI Analysis

This work addresses improving far-field speaker systems for voice-activated devices, but it is incremental as it applies known teacher-student learning techniques to specific components.

The study tackled adapting acoustic models for far-field speech recognition and compressing keyword spotting models using teacher-student learning, resulting in a 72.60% relative word error rate reduction on play-back data and a 27-fold model size reduction without accuracy loss.

In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system. Specifically, we use teacher-student (T/S) learning to adapt a close-talk well-trained production AM to far-field by using parallel close-talk and simulated far-field data. We also use T/S learning to compress a large-size KWS model into a small-size one to fit the device computational cost. Without the need of transcription, T/S learning well utilizes untranscribed data to boost the model performance in both the AM adaptation and KWS model compression. We further optimize the models with sequence discriminative training and live data to reach the best performance of systems. The adapted AM improved from the baseline by 72.60% and 57.16% relative word error rate reduction on play-back and live test data, respectively. The final KWS model size was reduced by 27 times from a large-size KWS model without losing accuracy.

View on arXiv PDF

Similar