ME MLFeb 17, 2021

Data-Driven Logistic Regression Ensembles With Applications in Genomics

Anthony-Alexander Christidis, Stefan Van Aelst, Ruben Zamar

arXiv:2102.08591v72.31 citations

Originality Incremental advance

AI Analysis

This work addresses the need for effective statistical methods in genomics to study disease genetics, offering practical tools for researchers, though it appears incremental by building on existing regularization and ensembling techniques.

The paper tackles high-dimensional binary classification in genomics by introducing a novel approach that integrates regularization with ensembling to improve prediction accuracy and biomarker identification, demonstrating strong predictive performance in simulations and cancer genomics datasets.

Advances in data collecting technologies in genomics have significantly increased the need for tools designed to study the genetic basis of many diseases. Effective statistical methods should excel in both prediction accuracy and biomarker identification. We introduce a novel approach to high-dimensional binary classification that integrates regularization with ensembling techniques. The method constructs compact ensembles of interpretable models derived by optimizing a global objective function. In medical genomics applications, the proposed approach identifies critical biomarkers overlooked by competing methods. We develop a variable importance ranking system to help researchers prioritize promising genes. The method's asymptotic properties are established, and an efficient computational algorithm is provided. Through extensive simulations across complex scenarios and analysis of cancer genomics datasets, we demonstrate strong predictive performance. Based on the numerical experiments, we offer practical guidelines for determining optimal ensemble size.

View on arXiv PDF

Similar