ML LGNov 4, 2016

Classification with Ultrahigh-Dimensional Features

Yanming Li, Hyokyoung Hong, Jian Kang, Kevin He, Ji Zhu, Yi Li

arXiv:1611.01541v11.33 citations

Originality Highly original

AI Analysis

This addresses a critical bottleneck in machine learning for domains like medical diagnostics where data has many more features than samples, representing a novel method rather than an incremental improvement.

The paper tackles classification with ultrahigh-dimensional features, where features greatly outnumber samples, by introducing a novel multivariate screening and classification method that leverages inter-feature correlations to detect weak signals and achieve asymptotic optimal misclassification rates, as demonstrated in simulations and a renal transplantation patient classification study.

Although much progress has been made in classification with high-dimensional features \citep{Fan_Fan:2008, JGuo:2010, CaiSun:2014, PRXu:2014}, classification with ultrahigh-dimensional features, wherein the features much outnumber the sample size, defies most existing work. This paper introduces a novel and computationally feasible multivariate screening and classification method for ultrahigh-dimensional data. Leveraging inter-feature correlations, the proposed method enables detection of marginally weak and sparse signals and recovery of the true informative feature set, and achieves asymptotic optimal misclassification rates. We also show that the proposed procedure provides more powerful discovery boundaries compared to those in \citet{CaiSun:2014} and \citet{JJin:2009}. The performance of the proposed procedure is evaluated using simulation studies and demonstrated via classification of patients with different post-transplantation renal functional types.

View on arXiv PDF

Similar