LGMay 7, 2024

Geometry and Dynamics of LayerNorm

arXiv:2405.04134v13 citationsh-index: 14
Originality Synthesis-oriented
AI Analysis

This provides deeper geometric intuition for a common component in neural networks, but it is incremental as it focuses on theoretical understanding without new applications.

The paper analyzes the LayerNorm function in deep neural networks, showing that its outputs lie within an (N-1)-dimensional hyperellipsoid and mapping typical inputs near its surface, with principal axes determined via eigen-decomposition of a constructed matrix.

A technical note aiming to offer deeper intuition for the LayerNorm function common in deep neural networks. LayerNorm is defined relative to a distinguished 'neural' basis, but it does more than just normalize the corresponding vector elements. Rather, it implements a composition -- of linear projection, nonlinear scaling, and then affine transformation -- on input activation vectors. We develop both a new mathematical expression and geometric intuition, to make the net effect more transparent. We emphasize that, when LayerNorm acts on an N-dimensional vector space, all outcomes of LayerNorm lie within the intersection of an (N-1)-dimensional hyperplane and the interior of an N-dimensional hyperellipsoid. This intersection is the interior of an (N-1)-dimensional hyperellipsoid, and typical inputs are mapped near its surface. We find the direction and length of the principal axes of this (N-1)-dimensional hyperellipsoid via the eigen-decomposition of a simply constructed matrix.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes