Composing Linear Layers from Irreducibles
This provides an algebraic perspective on geometric primitives in deep models, though it appears incremental as it matches rather than surpasses existing baselines.
The paper tackles the problem of understanding compositional structure in linear layers by identifying geometric primitives, showing that linear transformations can be expressed as compositions of bivectors using Clifford algebra with O(log^2 d) parameters instead of O(d^2). The result demonstrates that rotor-based layers match the performance of strong baselines like block-Hadamard and low-rank approximations in LLM attention layers.
Contemporary large models often exhibit behaviors suggesting the presence of low-level primitives that compose into modules with richer functionality, but these fundamental building blocks remain poorly understood. We investigate this compositional structure in linear layers by asking: can we identify/synthesize linear transformations from a minimal set of geometric primitives? Using Clifford algebra, we show that linear layers can be expressed as compositions of bivectors -- geometric objects encoding oriented planes -- and introduce a differentiable algorithm that decomposes them into products of rotors. This construction uses only O(log^2 d) parameters, versus O(d^2) required by dense matrices. Applied to the key, query, and value projections in LLM attention layers, our rotor-based layers match the performance of strong baselines such as block-Hadamard and low-rank approximations. Our findings provide an algebraic perspective on how these geometric primitives can compose into higher-level functions within deep models.