High Dimensional Semiparametric Scale-Invariant Principal Component Analysis
This provides a more robust and interpretable PCA method for high-dimensional data analysis, though it appears incremental as it builds upon existing PCA techniques.
The paper tackles the problem of high-dimensional principal component analysis by proposing Copula Component Analysis (COCA), a semiparametric method that improves robustness to modeling assumptions, outliers, and scale, and it outperforms sparse PCA on synthetic and real-world datasets.
We propose a new high dimensional semiparametric principal component analysis (PCA) method, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust to outliers and data contamination; (iii) It is scale-invariant and yields more interpretable results. We prove that the COCA estimators obtain fast estimation rates and are feature selection consistent when the dimension is nearly exponentially large relative to the sample size. Careful experiments confirm that COCA outperforms sparse PCA on both synthetic and real-world datasets.