LGAICLMLAug 26, 2019

Differentiable Product Quantization for End-to-End Embedding Compression

arXiv:1908.09756v381 citations
AI Analysis

This addresses memory constraints for users of embedding layers in NLP, offering a drop-in compression solution with minimal performance impact.

The paper tackles the memory and storage challenge of embedding layers by proposing differentiable product quantization (DPQ), a generic end-to-end learnable compression framework, achieving compression ratios of 14-238x with negligible performance loss on 10 datasets across three language tasks.

Embedding layers are commonly used to map discrete symbols into continuous embedding vectors that reflect their semantic meanings. Despite their effectiveness, the number of parameters in an embedding layer increases linearly with the number of symbols and poses a critical challenge on memory and storage constraints. In this work, we propose a generic and end-to-end learnable compression framework termed differentiable product quantization (DPQ). We present two instantiations of DPQ that leverage different approximation techniques to enable differentiability in end-to-end learning. Our method can readily serve as a drop-in alternative for any existing embedding layer. Empirically, DPQ offers significant compression ratios (14-238$\times$) at negligible or no performance cost on 10 datasets across three different language tasks.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes