LGJul 9, 2025

Estimating Dataset Dimension via Singular Metrics under the Manifold Hypothesis: Application to Inverse Problems

arXiv:2507.07291v14.1

Originality Incremental advance

AI Analysis

This work addresses the challenge of leveraging low-dimensional structures in high-dimensional data for inverse problems, particularly in biomedical imaging, though it appears incremental as it builds on existing VAE and Riemannian geometry methods.

The authors tackled the problem of estimating the intrinsic dimension of datasets under the manifold hypothesis and applied it to improve solutions for ill-posed inverse problems, achieving enhanced reconstructions in biomedical imaging by enforcing that outputs lie on a learned manifold.

High-dimensional datasets often exhibit low-dimensional geometric structures, as suggested by the manifold hypothesis, which implies that data lie on a smooth manifold embedded in a higher-dimensional ambient space. While this insight underpins many advances in machine learning and inverse problems, fully leveraging it requires to deal with three key tasks: estimating the intrinsic dimension (ID) of the manifold, constructing appropriate local coordinates, and learning mappings between ambient and manifold spaces. In this work, we propose a framework that addresses all these challenges using a Mixture of Variational Autoencoders (VAEs) and tools from Riemannian geometry. We specifically focus on estimating the ID of datasets by analyzing the numerical rank of the VAE decoder pullback metric. The estimated ID guides the construction of an atlas of local charts using a mixture of invertible VAEs, enabling accurate manifold parameterization and efficient inference. We how this approach enhances solutions to ill-posed inverse problems, particularly in biomedical imaging, by enforcing that reconstructions lie on the learned manifold. Lastly, we explore the impact of network pruning on manifold geometry and reconstruction quality, showing that the intrinsic dimension serves as an effective proxy for monitoring model capacity.

View on arXiv PDF

Similar