LGAIMLJun 21, 2018

Learning K-way D-dimensional Discrete Codes for Compact Embedding Representations

arXiv:1806.09464v172 citations
Originality Highly original
AI Analysis

This addresses the issue of parameter efficiency for researchers and practitioners in fields like natural language processing and graph networks, offering a significant reduction in model size with competitive results.

The paper tackles the problem of large parameter sizes and overfitting in conventional embedding methods by proposing a compact K-way D-dimensional discrete encoding scheme, which reduces embedding layer size by up to 98% while maintaining or improving performance across various applications.

Conventional embedding methods directly associate each symbol with a continuous embedding vector, which is equivalent to applying a linear transformation based on a "one-hot" encoding of the discrete symbols. Despite its simplicity, such approach yields the number of parameters that grows linearly with the vocabulary size and can lead to overfitting. In this work, we propose a much more compact K-way D-dimensional discrete encoding scheme to replace the "one-hot" encoding. In the proposed "KD encoding", each symbol is represented by a $D$-dimensional code with a cardinality of $K$, and the final symbol embedding vector is generated by composing the code embedding vectors. To end-to-end learn semantically meaningful codes, we derive a relaxed discrete optimization approach based on stochastic gradient descent, which can be generally applied to any differentiable computational graph with an embedding layer. In our experiments with various applications from natural language processing to graph convolutional networks, the total size of the embedding layer can be reduced up to 98\% while achieving similar or better performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes