LGAICVDec 17, 2025

SALVE: Sparse Autoencoder-Latent Vector Editing for Mechanistic Control of Neural Networks

arXiv:2512.15938v1h-index: 7
Originality Incremental advance
AI Analysis

This work addresses the need for transparent and controllable AI systems, offering a principled methodology for model editing, though it appears incremental as it builds on existing interpretability and editing techniques.

The authors tackled the problem of interpreting and controlling deep neural networks by developing SALVE, a framework that discovers sparse features, validates them with saliency mapping, and enables precise weight-space edits, achieving consistent control across ResNet-18 and ViT-B/16 models.

Deep neural networks achieve impressive performance but remain difficult to interpret and control. We present SALVE (Sparse Autoencoder-Latent Vector Editing), a unified "discover, validate, and control" framework that bridges mechanistic interpretability and model editing. Using an $\ell_1$-regularized autoencoder, we learn a sparse, model-native feature basis without supervision. We validate these features with Grad-FAM, a feature-level saliency mapping method that visually grounds latent features in input data. Leveraging the autoencoder's structure, we perform precise and permanent weight-space interventions, enabling continuous modulation of both class-defining and cross-class features. We further derive a critical suppression threshold, $α_{crit}$, quantifying each class's reliance on its dominant feature, supporting fine-grained robustness diagnostics. Our approach is validated on both convolutional (ResNet-18) and transformer-based (ViT-B/16) models, demonstrating consistent, interpretable control over their behavior. This work contributes a principled methodology for turning feature discovery into actionable model edits, advancing the development of transparent and controllable AI systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes