LG AI CV MLApr 3, 2012

Validation of nonlinear PCA

arXiv:1204.0684v176 citations

Originality Incremental advance

AI Analysis

This addresses a specific methodological issue in unsupervised learning for researchers applying nonlinear PCA, but it is incremental as it builds on existing nonlinear PCA frameworks.

The paper tackles the problem of model selection for nonlinear PCA, where standard validation techniques fail due to its unsupervised nature, by proposing a new approach that uses missing data estimation error to select optimal model complexity, which correctly identifies the optimal model while standard methods favor over-fitted ones.

Linear principal component analysis (PCA) can be extended to a nonlinear PCA by using artificial neural networks. But the benefit of curved components requires a careful control of the model complexity. Moreover, standard techniques for model selection, including cross-validation and more generally the use of an independent test set, fail when applied to nonlinear PCA because of its inherent unsupervised characteristics. This paper presents a new approach for validating the complexity of nonlinear PCA models by using the error in missing data estimation as a criterion for model selection. It is motivated by the idea that only the model of optimal complexity is able to predict missing values with the highest accuracy. While standard test set validation usually favours over-fitted nonlinear PCA models, the proposed model validation approach correctly selects the optimal model complexity.

View on arXiv PDF

Similar