LG AINov 3, 2021

Multivariate feature ranking of gene expression data

Fernando Jiménez, Gracia Sánchez, José Palma, Luis Miralles-Pechuán, Juan Botía

arXiv:2111.02357v41.66 citations

Originality Incremental advance

AI Analysis

This work addresses the need for efficient feature ranking in gene expression analysis, though it is incremental as it builds on existing multivariate approaches.

The authors tackled the problem of high-dimensional gene expression data by proposing two new multivariate feature ranking methods based on pairwise correlation and consistency, which statistically outperformed state-of-the-art methods in three classification problems.

Gene expression datasets are usually of high dimensionality and therefore require efficient and effective methods for identifying the relative importance of their attributes. Due to the huge size of the search space of the possible solutions, the attribute subset evaluation feature selection methods tend to be not applicable, so in these scenarios feature ranking methods are used. Most of the feature ranking methods described in the literature are univariate methods, so they do not detect interactions between factors. In this paper we propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency, which we have applied in three gene expression classification problems. We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance, as well as feature selection methods of attribute subset evaluation based on correlation and consistency with multi-objective evolutionary search strategy.

View on arXiv PDF

Similar