LGApr 22, 2025

Semantics at an Angle: When Cosine Similarity Works Until It Doesn't

arXiv:2504.16318v215 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This is an incremental analysis for researchers and practitioners using embeddings in machine learning, highlighting conceptual and practical insights.

The paper examines the limitations of cosine similarity for comparing embeddings, particularly when embedding norms contain semantic information, and discusses emerging alternatives to address these issues.

Cosine similarity has become a standard metric for comparing embeddings in modern machine learning. Its scale-invariance and alignment with model training objectives have contributed to its widespread adoption. However, recent studies have revealed important limitations, particularly when embedding norms carry meaningful semantic information. This informal article offers a reflective and selective examination of the evolution, strengths, and limitations of cosine similarity. We highlight why it performs well in many settings, where it tends to break down, and how emerging alternatives are beginning to address its blind spots. We hope to offer a mix of conceptual clarity and practical perspective, especially for quantitative scientists who think about embeddings not just as vectors, but as geometric and philosophical objects.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes