LGDIS-NNAIOct 15, 2024

The Persian Rug: solving toy models of superposition using large-scale symmetries

arXiv:2410.12101v21 citationsh-index: 5Has Code
Originality Incremental advance
AI Analysis

This work advances neural network interpretability by introducing techniques to understand autoencoder structures, though it is incremental as it builds on prior models.

The authors provided a complete mechanistic description of a minimal non-linear sparse data autoencoder in high dimensions, showing it learns an algorithm sensitive only to large-scale weight statistics and is near-optimal among recent architectures, with performance improvements limited to constant factors.

We present a complete mechanistic description of the algorithm learned by a minimal non-linear sparse data autoencoder in the limit of large input dimension. The model, originally presented in arXiv:2209.10652, compresses sparse data vectors through a linear layer and decompresses using another linear layer followed by a ReLU activation. We notice that when the data is permutation symmetric (no input feature is privileged) large models reliably learn an algorithm that is sensitive to individual weights only through their large-scale statistics. For these models, the loss function becomes analytically tractable. Using this understanding, we give the explicit scalings of the loss at high sparsity, and show that the model is near-optimal among recently proposed architectures. In particular, changing or adding to the activation function any elementwise or filtering operation can at best improve the model's performance by a constant factor. Finally, we forward-engineer a model with the requisite symmetries and show that its loss precisely matches that of the trained models. Unlike the trained model weights, the low randomness in the artificial weights results in miraculous fractal structures resembling a Persian rug, to which the algorithm is oblivious. Our work contributes to neural network interpretability by introducing techniques for understanding the structure of autoencoders. Code to reproduce our results can be found at https://github.com/KfirD/PersianRug .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes