Sparse quadratic classification rules via linear dimension reduction
This work addresses classification challenges in high-dimensional biological data, such as gene expression, with an incremental improvement in efficiency and scalability.
The paper tackles high-dimensional classification with unequal covariance matrices by proposing a method that combines variable selection and linear dimension reduction before applying quadratic discriminant analysis, avoiding precision matrix estimation and scaling linearly with measurements. It provides theoretical guarantees and empirical validation, including application to breast cancer gene expression data where it confirms the ESR1 gene's importance in differentiating estrogen receptor status.
We consider the problem of high-dimensional classification between the two groups with unequal covariance matrices. Rather than estimating the full quadratic discriminant rule, we propose to perform simultaneous variable selection and linear dimension reduction on original data, with the subsequent application of quadratic discriminant analysis on the reduced space. In contrast to quadratic discriminant analysis, the proposed framework doesn't require estimation of precision matrices and scales linearly with the number of measurements, making it especially attractive for the use on high-dimensional datasets. We support the methodology with theoretical guarantees on variable selection consistency, and empirical comparison with competing approaches. We apply the method to gene expression data of breast cancer patients, and confirm the crucial importance of ESR1 gene in differentiating estrogen receptor status.