CLHCAug 5, 2021

Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification

arXiv:2108.02598v131 citations
AI Analysis

This work addresses the challenge of acoustic variation in spoken language understanding for applications like voice assistants, though it is incremental as it adapts existing distillation methods to a specific domain.

The paper tackled the problem of limited speech resources for end-to-end intent classification by distilling knowledge from a BERT language model to a speech transformer model, achieving intent classification accuracies of 99.10% on Fluent speech corpus and 88.79% on ATIS database.

End-to-end intent classification using speech has numerous advantages compared to the conventional pipeline approach using automatic speech recognition (ASR), followed by natural language processing modules. It attempts to predict intent from speech without using an intermediate ASR module. However, such end-to-end framework suffers from the unavailability of large speech resources with higher acoustic variation in spoken language understanding. In this work, we exploit the scope of the transformer distillation method that is specifically designed for knowledge distillation from a transformer based language model to a transformer based speech model. In this regard, we leverage the reliable and widely used bidirectional encoder representations from transformers (BERT) model as a language model and transfer the knowledge to build an acoustic model for intent classification using the speech. In particular, a multilevel transformer based teacher-student model is designed, and knowledge distillation is performed across attention and hidden sub-layers of different transformer layers of the student and teacher models. We achieve an intent classification accuracy of 99.10% and 88.79% for Fluent speech corpus and ATIS database, respectively. Further, the proposed method demonstrates better performance and robustness in acoustically degraded condition compared to the baseline method.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes