Toward Manifest Relationality in Transformers via Symmetry Reduction
This work addresses redundancy in transformers for AI researchers, offering a novel geometric approach that could lead to more efficient models, though it appears incremental as it builds on existing symmetry-breaking methods.
The paper tackles the problem of internal redundancy in transformer models due to coordinate-dependent representations and continuous symmetries by proposing a symmetry reduction framework that reformulates representations, attention, and optimization in terms of invariant relational quantities, eliminating redundant degrees of freedom to reduce parameter redundancy and provide a geometric analysis of optimization.
Transformer models contain substantial internal redundancy arising from coordinate-dependent representations and continuous symmetries, in model space and in head space, respectively. While recent approaches address this by explicitly breaking symmetry, we propose a complementary framework based on symmetry reduction. We reformulate representations, attention mechanisms, and optimization dynamics in terms of invariant relational quantities, eliminating redundant degrees of freedom by construction. This perspective yields architectures that operate directly on relational structures, providing a principled geometric framework for reducing parameter redundancy and analyzing optimization.