CLApr 8, 2019

Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization

arXiv:1904.04163v126 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in language modeling for applications like speech recognition, though it is incremental as it builds on existing knowledge distillation techniques.

The paper tackles the problem of large memory and computational requirements in Recurrent Neural Network language models (RNNLMs) by applying knowledge distillation with trust regularization, reducing parameter size to one-third of the previous best model while maintaining state-of-the-art perplexity on Penn Treebank and cutting model size to 18.5% of the baseline with no degradation in word error rate on Wall Street Journal data.

Recurrent Neural Networks (RNNs) have dominated language modeling because of their superior performance over traditional N-gram based models. In many applications, a large Recurrent Neural Network language model (RNNLM) or an ensemble of several RNNLMs is used. These models have large memory footprints and require heavy computation. In this paper, we examine the effect of applying knowledge distillation in reducing the model size for RNNLMs. In addition, we propose a trust regularization method to improve the knowledge distillation training for RNNLMs. Using knowledge distillation with trust regularization, we reduce the parameter size to a third of that of the previously published best model while maintaining the state-of-the-art perplexity result on Penn Treebank data. In a speech recognition N-bestrescoring task, we reduce the RNNLM model size to 18.5% of the baseline system, with no degradation in word error rate(WER) performance on Wall Street Journal data set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes