LGAIRTDec 11, 2023

Grokking Group Multiplication with Cosets

arXiv:2312.06581v220 citationsh-index: 4ICML
Originality Incremental advance
AI Analysis

This work addresses the interpretability of deep neural networks for high-stakes applications by providing a detailed reverse engineering of models on algorithmic tasks, though it is incremental as it builds on previous work.

The researchers completely reverse engineered fully connected one-hidden layer neural networks that had 'grokked' the arithmetic of permutation groups S5 and S6, discovering that the models identified the true subgroup structure and converged on neural circuits using subgroups for decomposition.

The complex and unpredictable nature of deep neural networks prevents their safe use in many high-stakes applications. There have been many techniques developed to interpret deep neural networks, but all have substantial limitations. Algorithmic tasks have proven to be a fruitful test ground for interpreting a neural network end-to-end. Building on previous work, we completely reverse engineer fully connected one-hidden layer networks that have ``grokked'' the arithmetic of the permutation groups $S_5$ and $S_6$. The models discover the true subgroup structure of the full group and converge on neural circuits that decompose the group arithmetic using the permutation group's subgroups. We relate how we reverse engineered the model's mechanisms and confirmed our theory was a faithful description of the circuit's functionality. We also draw attention to current challenges in conducting interpretability research by comparing our work to Chughtai et al. [4] which alleges to find a different algorithm for this same problem.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes