Feature Selection for Microarray Gene Expression Data using Simulated Annealing guided by the Multivariate Joint Entropy
This work addresses gene selection for microarray data analysis, which is incremental as it builds on prior TAFS algorithms with a new entropy calculation and optimization method.
The authors tackled feature selection for microarray gene expression data by developing a new multivariate joint entropy measure and the mu-TAFS algorithm using simulated annealing, resulting in high classification performance and small, biologically meaningful gene subsets.
In this work a new way to calculate the multivariate joint entropy is presented. This measure is the basis for a fast information-theoretic based evaluation of gene relevance in a Microarray Gene Expression data context. Its low complexity is based on the reuse of previous computations to calculate current feature relevance. The mu-TAFS algorithm --named as such to differentiate it from previous TAFS algorithms-- implements a simulated annealing technique specially designed for feature subset selection. The algorithm is applied to the maximization of gene subset relevance in several public-domain microarray data sets. The experimental results show a notoriously high classification performance and low size subsets formed by biologically meaningful genes.