LGCLFeb 20

Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

arXiv:2602.18417v1
Originality Incremental advance
AI Analysis

This work addresses the problem of designing sequence models with mathematical structure for researchers in machine learning, though it appears incremental as it builds on existing subgroup-based approaches.

The paper presents a framework for sequence models using closed subgroups of U(d), deriving recurrent and transformer architectures from a shared skeleton where subgroup choice determines state space, tangent projection, and update maps. It specializes to O(d) and evaluates orthogonal-state RNN and transformer models on Tiny Shakespeare and Penn Treebank, showing improved performance with a linear-mixing extension in tangent space.

This paper presents a direct framework for sequence models with hidden states on closed subgroups of U(d). We use a minimal axiomatic setup and derive recurrent and transformer templates from a shared skeleton in which subgroup choice acts as a drop-in replacement for state space, tangent projection, and update map. We then specialize to O(d) and evaluate orthogonal-state RNN and transformer models on Tiny Shakespeare and Penn Treebank under parameter-matched settings. We also report a general linear-mixing extension in tangent space, which applies across subgroup choices and improves finite-budget performance in the current O(d) experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes