CLAILGFeb 12, 2025

Better Embeddings with Coupled Adam

arXiv:2502.08441v35 citationsh-index: 1ACL
Originality Incremental advance
AI Analysis

This addresses a specific issue in LLM training for researchers and practitioners, but it is incremental as it modifies an existing optimizer.

The paper tackled the problem of anisotropic embeddings in LLMs by identifying the second moment in Adam as a cause and proposing Coupled Adam to mitigate it, resulting in significantly improved embedding quality and better upstream and downstream performance on large datasets.

Despite their remarkable capabilities, LLMs learn word representations that exhibit the undesirable yet poorly understood feature of anisotropy. In this paper, we argue that the second moment in Adam is a cause of anisotropic embeddings, and suggest a modified optimizer called Coupled Adam to mitigate the problem. Our experiments demonstrate that Coupled Adam significantly improves the quality of embeddings, while also leading to better upstream and downstream performance on large enough datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes