CLApr 29, 2025

From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models

arXiv:2505.00033v12 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses the scalability problem in language modeling for NLP practitioners by offering a more efficient alternative to transformers, though it appears incremental as it builds on existing spectral and dictionary learning concepts.

The authors tackled the computational inefficiency of transformer architectures by replacing self-attention with a spectral generative modeling framework that learns a global Fourier dictionary and per-token mixing coefficients, achieving competitive perplexity and generation quality on WikiText2 and Penn Treebank benchmarks while reducing computation from quadratic to linear complexity.

We propose a novel spectral generative modeling framework for natural language processing that jointly learns a global time varying Fourier dictionary and per token mixing coefficients, replacing the ubiquitous self attention mechanism in transformer architectures. By enforcing reconstruction losses in both the time domain (embedding reconstruction) and the frequency domain (via Short Time Fourier Transform magnitude matching) alongside a standard language modeling objective, and fitting a Gaussian Mixture Model (GMM) prior over the learned mixing vectors, our approach achieves competitive perplexity and generation quality on standard benchmarks such as WikiText2 and Penn Treebank. In contrast to the quadratic computation complexity of self attention, our method operates with linear complexity, delivering substantial efficiency gains. We demonstrate that spectral dictionary models can achieve competitive performance compared to transformer baselines while significantly reducing inference latency and memory footprint, offering a compelling alternative for scalable language modeling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes