PRISM: Deriving the Transformer as a Signal-Denoising Operator via Maximum Coding Rate Reduction

arXiv:2601.15540v1

Originality Incremental advance

AI Analysis

This addresses the interpretability problem in deep learning for researchers and practitioners, offering a principled geometric approach that unifies interpretability and performance, though it appears incremental as it builds on existing Transformer and MCR² principles.

The authors tackled the lack of interpretability in Transformers by proposing Prism, a white-box attention-based architecture derived from maximizing coding rate reduction, which spontaneously specialized attention heads into spectrally distinct regimes for signal and noise on TinyStories.

Deep learning models, particularly Transformers, are often criticized as "black boxes" and lack interpretability. We propose Prism, a white-box attention-based architecture derived from the principles of Maximizing Coding Rate Reduction ($\text{MCR}^2$). By modeling the attention mechanism as a gradient ascent process on a distinct signal-noise manifold, we introduce two physical constraints: an overcomplete dictionary to expand the representational phase space, and an irrational frequency separation ($π$-RoPE) to enforce incoherence between signal and noise subspaces. We demonstrate that these geometric inductive biases can be viewed as a physical constraint and they are sufficient to induce unsupervised functional disentanglement alone. Using TinyStories as a controlled testbed for verifying spectral dynamics, we observe that Prism spontaneously specializes its attention heads into spectrally distinct regimes: low-frequency heads capturing long-range causal dependencies (signal) and high-frequency heads handling local syntactic constraints (noise). Our results suggest that interpretability and performance are not a trade-off, but can be unified through principled geometric construction.

View on arXiv PDF

Similar