CLLGOct 2, 2019

Improving Word Embedding Factorization for Compression Using Distilled Nonlinear Neural Decomposition

arXiv:1910.06720v21003 citations
Originality Incremental advance
AI Analysis

This addresses memory constraints for edge deployment of NLP models, offering a simple and effective compression technique, though it is incremental as it builds on existing decomposition and distillation approaches.

The paper tackles the problem of memory-intensive word embeddings in NLP models by proposing Distilled Embedding, a compression method using low-rank matrix decomposition and knowledge distillation, resulting in higher BLEU scores for translation and lower perplexity for language modeling compared to state-of-the-art methods.

Word-embeddings are vital components of Natural Language Processing (NLP) models and have been extensively explored. However, they consume a lot of memory which poses a challenge for edge deployment. Embedding matrices, typically, contain most of the parameters for language models and about a third for machine translation systems. In this paper, we propose Distilled Embedding, an (input/output) embedding compression method based on low-rank matrix decomposition and knowledge distillation. First, we initialize the weights of our decomposed matrices by learning to reconstruct the full pre-trained word-embedding and then fine-tune end-to-end, employing knowledge distillation on the factorized embedding. We conduct extensive experiments with various compression rates on machine translation and language modeling, using different data-sets with a shared word-embedding matrix for both embedding and vocabulary projection matrices. We show that the proposed technique is simple to replicate, with one fixed parameter controlling compression size, has higher BLEU score on translation and lower perplexity on language modeling compared to complex, difficult to tune state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes