LGDec 17, 2025

The Deleuzian Representation Hypothesis

arXiv:2512.19734v1
Originality Incremental advance
AI Analysis

This provides a new unsupervised method for interpretability in neural networks, which is an incremental advance over existing sparse autoencoder approaches.

The authors tackled the problem of extracting interpretable concepts from neural networks by proposing an alternative to sparse autoencoders that clusters differences in activations, achieving concept quality surpassing prior unsupervised SAE variants and approaching supervised baselines across five models and three modalities.

We propose an alternative to sparse autoencoders (SAEs) as a simple and effective unsupervised method for extracting interpretable concepts from neural networks. The core idea is to cluster differences in activations, which we formally justify within a discriminant analysis framework. To enhance the diversity of extracted concepts, we refine the approach by weighting the clustering using the skewness of activations. The method aligns with Deleuze's modern view of concepts as differences. We evaluate the approach across five models and three modalities (vision, language, and audio), measuring concept quality, diversity, and consistency. Our results show that the proposed method achieves concept quality surpassing prior unsupervised SAE variants while approaching supervised baselines, and that the extracted concepts enable steering of a model's inner representations, demonstrating their causal influence on downstream behavior.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes