LGDIS-NNAIApr 28, 2025

Representation Learning on a Random Lattice

arXiv:2504.20197v11 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of making AI systems safer and more reliable through better feature understanding, though it is incremental as it builds on prior geometric and interpretability approaches.

The paper tackled the problem of interpreting deep neural network representations by modeling them as a learned coordinate system on a random lattice, using percolation theory to analyze features, and found qualitative consistency with existing interpretability research.

Decomposing a deep neural network's learned representations into interpretable features could greatly enhance its safety and reliability. To better understand features, we adopt a geometric perspective, viewing them as a learned coordinate system for mapping an embedded data distribution. We motivate a model of a generic data distribution as a random lattice and analyze its properties using percolation theory. Learned features are categorized into context, component, and surface features. The model is qualitatively consistent with recent findings in mechanistic interpretability and suggests directions for future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes