AINov 12, 2025

What We Don't C: Representations for scientific discovery beyond VAEs

Brian Rogers, Micah Bowles, Chris J. Lintott, Steve Croft

arXiv:2511.09433v1h-index: 4

Originality Incremental advance

AI Analysis

This work addresses the challenge of analyzing and controlling latent representations in high-dimensional domains, which is crucial for scientific exploration, though it appears incremental as it builds on existing generative model techniques.

The paper tackles the problem of accessing information in learned representations for scientific discovery by introducing a method based on latent flow matching with classifier-free guidance to disentangle latent subspaces, showing it enables access to meaningful features across synthetic and real datasets.

Accessing information in learned representations is critical for scientific discovery in high-dimensional domains. We introduce a novel method based on latent flow matching with classifier-free guidance that disentangles latent subspaces by explicitly separating information included in conditioning from information that remains in the residual representation. Across three experiments -- a synthetic 2D Gaussian toy problem, colored MNIST, and the Galaxy10 astronomy dataset -- we show that our method enables access to meaningful features of high dimensional data. Our results highlight a simple yet powerful mechanism for analyzing, controlling, and repurposing latent representations, providing a pathway toward using generative models for scientific exploration of what we don't capture, consider, or catalog.

View on arXiv PDF

Similar