Determining Principal Component Cardinality through the Principle of Minimum Description Length
This work addresses a key model selection problem in dimensionality reduction for data analysis, but it appears incremental as it builds on existing MDL theory without claiming broad SOTA improvements.
The paper tackles the challenge of selecting the number of principal components in PCA by applying the Minimum Description Length (MDL) principle, specifically using the Normalized Maximum Likelihood (NML) criterion, and bounds the NML for PCA in terms of known linear regression NML terms.
PCA (Principal Component Analysis) and its variants areubiquitous techniques for matrix dimension reduction and reduced-dimensionlatent-factor extraction. One significant challenge in using PCA, is thechoice of the number of principal components. The information-theoreticMDL (Minimum Description Length) principle gives objective compression-based criteria for model selection, but it is difficult to analytically applyits modern definition - NML (Normalized Maximum Likelihood) - to theproblem of PCA. This work shows a general reduction of NML prob-lems to lower-dimension problems. Applying this reduction, it boundsthe NML of PCA, by terms of the NML of linear regression, which areknown.