Exact Dimensionality Selection for Bayesian PCA
This work addresses dimensionality selection in data analysis, which is crucial for reducing complexity and improving interpretability in fields like machine learning and statistics, though it appears incremental as it builds on existing probabilistic PCA frameworks.
The authors tackled the problem of estimating the intrinsic dimensionality of high-dimensional datasets by introducing a Bayesian model selection approach with a normal-gamma prior, resulting in a competitive method that performs well on simulated data compared to state-of-the-art techniques.
We present a Bayesian model selection approach to estimate the intrinsic dimensionality of a high-dimensional dataset. To this end, we introduce a novel formulation of the probabilisitic principal component analysis model based on a normal-gamma prior distribution. In this context, we exhibit a closed-form expression of the marginal likelihood which allows to infer an optimal number of components. We also propose a heuristic based on the expected shape of the marginal likelihood curve in order to choose the hyperparameters. In non-asymptotic frameworks, we show on simulated data that this exact dimensionality selection approach is competitive with both Bayesian and frequentist state-of-the-art methods.