forgeNet: A graph deep neural network model using tree-based ensemble classifiers for feature extraction
This addresses the problem of robust disease outcome classification in omics data for researchers, offering an incremental improvement over existing sparse deep learning methods.
The paper tackles the challenge of high-dimensional omics data classification by proposing forgeNet, a model that learns feature graphs from data instead of relying on external networks, achieving high classification accuracy on synthetic and real datasets.
A unique challenge in predictive model building for omics data has been the small number of samples $(n)$ versus the large amount of features $(p)$. This "$n\ll p$" property brings difficulties for disease outcome classification using deep learning techniques. Sparse learning by incorporating external gene network information such as the graph-embedded deep feedforward network (GEDFN) model has been a solution to this issue. However, such methods require an existing feature graph, and potential mis-specification of the feature graph can be harmful on classification and feature selection. To address this limitation and develop a robust classification model without relying on external knowledge, we propose a \underline{for}est \underline{g}raph-\underline{e}mbedded deep feedforward \underline{net}work (forgeNet) model, to integrate the GEDFN architecture with a forest feature graph extractor, so that the feature graph can be learned in a supervised manner and specifically constructed for a given prediction task. To validate the method's capability, we experimented the forgeNet model with both synthetic and real datasets. The resulting high classification accuracy suggests that the method is a valuable addition to sparse deep learning models for omics data.