NEMLJul 12, 2012

Biogeography-Based Informative Gene Selection and Cancer Classification Using SVM and Random Forests

arXiv:1207.3285v136 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of improving cancer classification from gene expression data for biomedical applications, but it is incremental as it combines existing methods without major breakthroughs.

The authors tackled the problem of high dimensionality in microarray cancer gene expression data by proposing two hybrid techniques, BBO-RF and BBO-SVM, which use biogeography-based optimization with gene ranking to select informative genes for classification, achieving classification accuracies comparable to existing algorithms on three cancer datasets.

Microarray cancer gene expression data comprise of very high dimensions. Reducing the dimensions helps in improving the overall analysis and classification performance. We propose two hybrid techniques, Biogeography - based Optimization - Random Forests (BBO - RF) and BBO - SVM (Support Vector Machines) with gene ranking as a heuristic, for microarray gene expression analysis. This heuristic is obtained from information gain filter ranking procedure. The BBO algorithm generates a population of candidate subset of genes, as part of an ecosystem of habitats, and employs the migration and mutation processes across multiple generations of the population to improve the classification accuracy. The fitness of each gene subset is assessed by the classifiers - SVM and Random Forests. The performances of these hybrid techniques are evaluated on three cancer gene expression datasets retrieved from the Kent Ridge Biomedical datasets collection and the libSVM data repository. Our results demonstrate that genes selected by the proposed techniques yield classification accuracies comparable to previously reported algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes