ML LGJul 4, 2018

Diagonal Discriminant Analysis with Feature Selection for High Dimensional Data

Sarah Elizabeth Romanes, John Thomas Ormerod, Jean YH Yang

arXiv:1807.01422v12 citations

Originality Incremental advance

AI Analysis

This provides a more efficient and interpretable classification method for high-dimensional data analysis, though it appears incremental as it builds on existing discriminant analysis techniques.

The authors tackled high-dimensional classification by developing multiDA, a hybrid model combining multiclass diagonal discriminant analysis with feature selection, which showed marked improvements in prediction accuracy, interpretability, and run time compared to other methods.

We introduce a new method of performing high dimensional discriminant analysis, which we call multiDA. We achieve this by constructing a hybrid model that seamlessly integrates a multiclass diagonal discriminant analysis model and feature selection components. Our feature selection component naturally simplifies to weights which are simple functions of likelihood ratio statistics allowing natural comparisons with traditional hypothesis testing methods. We provide heuristic arguments suggesting desirable asymptotic properties of our algorithm with regards to feature selection. We compare our method with several other approaches, showing marked improvements in regard to prediction accuracy, interpretability of chosen features, and algorithm run time. We demonstrate such strengths of our model by showing strong classification performance on publicly available high dimensional datasets, as well as through multiple simulation studies. We make an R package available implementing our approach.

View on arXiv PDF

Similar