QM LGNov 20, 2023

MiniAnDE: a reduced AnDE ensemble to deal with microarray data

Pablo Torrijos, José A. Gámez, José M. Puerta

arXiv:2311.12879v11.21 citationsh-index: 27Has Code

Originality Synthesis-oriented

AI Analysis

This is an incremental improvement for bioinformatics researchers dealing with microarray data classification.

The paper tackles the problem of supervised classification for high-dimensional, small-sample data like microarrays by proposing MiniAnDE, a reduced ensemble method that outperforms Naive Bayes and other ensembles such as bagging and random forest in accuracy.

This article focuses on the supervised classification of datasets with a large number of variables and a small number of instances. This is the case, for example, for microarray data sets commonly used in bioinformatics. Complex classifiers that require estimating statistics over many variables are not suitable for this type of data. Probabilistic classifiers with low-order probability tables, e.g. NB and AODE, are good alternatives for dealing with this type of data. AODE usually improves NB in accuracy, but suffers from high spatial complexity since $k$ models, each with $n+1$ variables, are included in the AODE ensemble. In this paper, we propose MiniAnDE, an algorithm that includes only a small number of heterogeneous base classifiers in the ensemble, i.e., each model only includes a different subset of the $k$ predictive variables. Experimental evaluation shows that using MiniAnDE classifiers on microarray data is feasible and outperforms NB and other ensembles such as bagging and random forest.

View on arXiv PDF Code

Similar