LGAGAug 30, 2024

Geometry of Lightning Self-Attention: Identifiability and Dimension

arXiv:2408.17221v215 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses foundational mathematical understanding of self-attention for researchers in machine learning theory, but it is incremental as it builds on existing algebraic geometry tools.

The paper theoretically analyzes the geometry of self-attention networks without normalization, focusing on identifiability and dimension of the function space, and provides results for deep and single-layer models.

We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes