Geometry of Lightning Self-Attention: Identifiability and Dimension
This work addresses foundational mathematical understanding of self-attention for researchers in machine learning theory, but it is incremental as it builds on existing algebraic geometry tools.
The paper theoretically analyzes the geometry of self-attention networks without normalization, focusing on identifiability and dimension of the function space, and provides results for deep and single-layer models.
We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.