Dimension Reduction for Data with Heterogeneous Missingness
This addresses a practical issue for data analysts dealing with incomplete datasets, though it is incremental as it builds on existing Gram matrix-based approaches.
The paper tackled the problem of applying dimension reduction to high-dimensional data with heterogeneous missing values by developing a bias-corrected Gram matrix, which significantly improved various dimension reduction methods in empirical tests on simulated and real datasets.
Dimension reduction plays a pivotal role in analysing high-dimensional data. However, observations with missing values present serious difficulties in directly applying standard dimension reduction techniques. As a large number of dimension reduction approaches are based on the Gram matrix, we first investigate the effects of missingness on dimension reduction by studying the statistical properties of the Gram matrix with or without missingness, and then we present a bias-corrected Gram matrix with nice statistical properties under heterogeneous missingness. Extensive empirical results, on both simulated and publicly available real datasets, show that the proposed unbiased Gram matrix can significantly improve a broad spectrum of representative dimension reduction approaches.