MLJan 18, 2018

A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data

arXiv:1801.06202v2103 citationsHas Code
AI Analysis

This work addresses a domain-specific problem for bioinformatics researchers dealing with high-dimensional gene expression data, offering an incremental improvement by combining sparse learning with deep neural networks.

The authors tackled the challenge of disease outcome classification using gene expression data with many more features than samples by proposing Graph-Embedded Deep Feedforward Networks (GEDFN), which integrated external gene network information to achieve high classification accuracy and interpretable feature selection in breast cancer data from TCGA.

Gene expression data represents a unique challenge in predictive model building, because of the small number of samples $(n)$ compared to the huge amount of features $(p)$. This "$n<<p$" property has hampered application of deep learning techniques for disease outcome classification. Sparse learning by incorporating external gene network information could be a potential solution to this issue. Still, the problem is very challenging because (1) there are tens of thousands of features and only hundreds of training samples, (2) the scale-free structure of the gene network is unfriendly to the setup of convolutional neural networks. To address these issues and build a robust classification model, we propose the Graph-Embedded Deep Feedforward Networks (GEDFN), to integrate external relational information of features into the deep neural network architecture. The method is able to achieve sparse connection between network layers to prevent overfitting. To validate the method's capability, we conducted both simulation experiments and a real data analysis using a breast cancer RNA-seq dataset from The Cancer Genome Atlas (TCGA). The resulting high classification accuracy and easily interpretable feature selection results suggest the method is a useful addition to the current classification models and feature selection procedures. The method is available at https://github.com/yunchuankong/NetworkNeuralNetwork.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes