CVLGNov 20, 2025

Sparse Autoencoders are Topic Models

arXiv:2511.16309v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work provides a new perspective on SAEs for researchers in machine learning and NLP, positioning them as tools for large-scale thematic analysis across modalities, though it is incremental in extending existing topic modeling concepts.

The paper tackles the problem of interpreting sparse autoencoders (SAEs) by proposing that they can be understood as topic models, and introduces SAE-TM, a framework that yields more coherent topics than baselines on text and image datasets. It demonstrates this by analyzing thematic structure in image datasets and tracing topic changes over time in Japanese woodblock prints.

Sparse autoencoders (SAEs) are used to analyze embeddings, but their role and practical value are debated. We propose a new perspective on SAEs by demonstrating that they can be naturally understood as topic models. We extend Latent Dirichlet Allocation to embedding spaces and derive the SAE objective as a maximum a posteriori estimator under this model. This view implies SAE features are thematic components rather than steerable directions. Based on this, we introduce SAE-TM, a topic modeling framework that: (1) trains an SAE to learn reusable topic atoms, (2) interprets them as word distributions on downstream data, and (3) merges them into any number of topics without retraining. SAE-TM yields more coherent topics than strong baselines on text and image datasets while maintaining diversity. Finally, we analyze thematic structure in image datasets and trace topic changes over time in Japanese woodblock prints. Our work positions SAEs as effective tools for large-scale thematic analysis across modalities. Code and data will be released upon publication.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes