RandomSCM: interpretable ensembles of sparse classifiers tailored for omics data
This addresses the need for scalable and interpretable models in precision medicine for analyzing omics data, though it appears incremental as it builds on ensemble methods.
The authors tackled the problem of predicting phenotypes from high-dimensional metabolomics data by proposing RandomSCM, an ensemble learning algorithm based on conjunctions or disjunctions of decision rules, which achieves high predictive performance and interpretability for biomarker discovery.
Background: Understanding the relationship between the Omics and the phenotype is a central problem in precision medicine. The high dimensionality of metabolomics data challenges learning algorithms in terms of scalability and generalization. Most learning algorithms do not produce interpretable models -- Method: We propose an ensemble learning algorithm based on conjunctions or disjunctions of decision rules. -- Results : Applications on metabolomics data shows that it produces models that achieves high predictive performances. The interpretability of the models makes them useful for biomarker discovery and patterns discovery in high dimensional data.