Distilling Word Embeddings: An Encoding Approach
This work addresses the need for efficient neural networks in resource-restricted systems, but it is incremental as it applies existing distillation concepts specifically to word embeddings.
The paper tackles the problem of distilling word embeddings for NLP tasks to create lightweight models, proposing an encoding approach that reduces model complexity while retaining high accuracy, with experiments showing it outperforms direct training with small embeddings.
Distilling knowledge from a well-trained cumbersome network to a small one has recently become a new research topic, as lightweight neural networks with high performance are particularly in need in various resource-restricted systems. This paper addresses the problem of distilling word embeddings for NLP tasks. We propose an encoding approach to distill task-specific knowledge from a set of high-dimensional embeddings, which can reduce model complexity by a large margin as well as retain high accuracy, showing a good compromise between efficiency and performance. Experiments in two tasks reveal the phenomenon that distilling knowledge from cumbersome embeddings is better than directly training neural networks with small embeddings.