LGSDOct 27, 2025

Learning Interpretable Features in Audio Latent Spaces via Sparse Autoencoders

arXiv:2510.23802v15 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the problem of interpretability in audio generation for researchers and practitioners, though it is incremental as it extends existing sparse autoencoder methods from language to audio models.

The paper tackles the challenge of interpreting audio generative models by developing a framework that maps latent representations to human-interpretable acoustic concepts like pitch, amplitude, and timbre, enabling controllable manipulation and analysis of AI music generation.

While sparse autoencoders (SAEs) successfully extract interpretable features from language models, applying them to audio generation faces unique challenges: audio's dense nature requires compression that obscures semantic meaning, and automatic feature characterization remains limited. We propose a framework for interpreting audio generative models by mapping their latent representations to human-interpretable acoustic concepts. We train SAEs on audio autoencoder latents, then learn linear mappings from SAE features to discretized acoustic properties (pitch, amplitude, and timbre). This enables both controllable manipulation and analysis of the AI music generation process, revealing how acoustic properties emerge during synthesis. We validate our approach on continuous (DiffRhythm-VAE) and discrete (EnCodec, WavTokenizer) audio latent spaces, and analyze DiffRhythm, a state-of-the-art text-to-music model, to demonstrate how pitch, timbre, and loudness evolve throughout generation. While our work is only done on audio modality, our framework can be extended to interpretable analysis of visual latent space generation models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes