GN LG QMOct 21, 2019

Is graph-based feature selection of genes better than random?

Mohammad Hashir, Paul Bertin, Martin Weiss, Vincent Frappier, Theodore J. Perkins, Geneviève Boucher, Joseph Paul Cohen

arXiv:1910.09600v31.2Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of feature selection in genomics for researchers, showing that biologically derived graphs may not offer significant advantages over random ones, indicating an incremental finding.

The study assessed whether gene interaction graphs capture dependencies in gene expression data better than random graphs, finding that random graphs perform almost as well, suggesting relevant cellular information is spread across many genes.

Gene interaction graphs aim to capture various relationships between genes and represent decades of biology research. When trying to make predictions from genomic data, those graphs could be used to overcome the curse of dimensionality by making machine learning models sparser and more consistent with biological common knowledge. In this work, we focus on assessing whether those graphs capture dependencies seen in gene expression data better than random. We formulate a condition that graphs should satisfy to provide a good prior knowledge and propose to test it using a `Single Gene Inference' (SGI) task. We compare random graphs with seven major gene interaction graphs published by different research groups, aiming to measure the true benefit of using biologically relevant graphs in this context. Our analysis finds that dependencies can be captured almost as well at random which suggests that, in terms of gene expression levels, the relevant information about the state of the cell is spread across many genes.

View on arXiv PDF Code

Similar