LGSep 8, 2025

Learning words in groups: fusion algebras, tensor ranks and grokking

arXiv:2509.06931v12 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding how neural networks learn algebraic structures, with potential implications for interpretability in AI, though it is incremental in exploring specific group-theoretic mechanisms.

The paper demonstrates that a two-layer neural network can learn arbitrary word operations in finite groups, achieving this by reframing the problem as learning a low-rank 3-tensor and leveraging group fusion structures, with the network exhibiting grokking and implementing efficient matrix multiplication in specific cases.

In this work, we demonstrate that a simple two-layer neural network with standard activation functions can learn an arbitrary word operation in any finite group, provided sufficient width is available and exhibits grokking while doing so. To explain the mechanism by which this is achieved, we reframe the problem as that of learning a particular $3$-tensor, which we show is typically of low rank. A key insight is that low-rank implementations of this tensor can be obtained by decomposing it along triplets of basic self-conjugate representations of the group and leveraging the fusion structure to rule out many components. Focusing on a phenomenologically similar but more tractable surrogate model, we show that the network is able to find such low-rank implementations (or approximations thereof), thereby using limited width to approximate the word-tensor in a generalizable way. In the case of the simple multiplication word, we further elucidate the form of these low-rank implementations, showing that the network effectively implements efficient matrix multiplication in the sense of Strassen. Our work also sheds light on the mechanism by which a network reaches such a solution under gradient descent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes