Robust Principal Component Analysis Using Statistical Estimators
This is an incremental improvement for robust dimensionality reduction in data analysis.
The paper tackles PCA's sensitivity to outliers by using robust statistical estimators like median, robust scaling, and Huber M-estimator to reestimate the covariance matrix. Results show the method handles outliers better than original PCA and matches Kernel PCA accuracy with lower computational cost in classification tasks.
Principal Component Analysis (PCA) finds a linear mapping and maximizes the variance of the data which makes PCA sensitive to outliers and may cause wrong eigendirection. In this paper, we propose techniques to solve this problem; we use the data-centering method and reestimate the covariance matrix using robust statistic techniques such as median, robust scaling which is a booster to data-centering and Huber M-estimator which measures the presentation of outliers and reweight them with small values. The results on several real world data sets show that our proposed method handles outliers and gains better results than the original PCA and provides the same accuracy with lower computation cost than the Kernel PCA using the polynomial kernel in classification tasks.