LGCLMLMar 30, 2023

Learning distributed representations with efficient SoftMax normalization

arXiv:2303.17475v41 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses a computational bottleneck in embedding learning for machine learning practitioners, offering an incremental improvement in efficiency.

The paper tackles the computational inefficiency of SoftMax normalization in learning distributed representations by proposing a linear-time heuristic approximation for bounded-norm embeddings, achieving comparable or higher accuracy on pre-trained datasets and lower computational time than existing methods.

Learning distributed representations, or embeddings, that encode the relational similarity patterns among objects is a relevant task in machine learning. A popular method to learn the embedding matrices $X, Y$ is optimizing a loss function of the term ${\rm SoftMax}(XY^T)$. The complexity required to calculate this term, however, runs quadratically with the problem size, making it a computationally heavy solution. In this article, we propose a linear-time heuristic approximation to compute the normalization constants of ${\rm SoftMax}(XY^T)$ for embedding vectors with bounded norms. We show on some pre-trained embedding datasets that the proposed estimation method achieves higher or comparable accuracy with competing methods. From this result, we design an efficient and task-agnostic algorithm that learns the embeddings by optimizing the cross entropy between the softmax and a set of probability distributions given as inputs. The proposed algorithm is interpretable and easily adapted to arbitrary embedding problems. We consider a few use cases and observe similar or higher performances and a lower computational time than similar ``2Vec'' algorithms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes