An end-to-end framework for gene expression classification by integrating a background knowledge graph: application to cancer prognosis prediction
This work addresses cancer prognosis prediction for medical researchers by providing an incremental improvement through integration of background biological networks.
The authors tackled cancer prognosis prediction by integrating a biological knowledge graph with gene expression data, resulting in a model that achieved higher accuracy than a deep neural network without such background information and improved ROC-AUC for many cancer types.
Biological data may be separated into primary data, such as gene expression, and secondary data, such as pathways and protein-protein interactions. Methods using secondary data to enhance the analysis of primary data are promising, because secondary data have background information that is not included in primary data. In this study, we proposed an end-to-end framework to integrally handle secondary data to construct a classification model for primary data. We applied this framework to cancer prognosis prediction using gene expression data and a biological network. Cross-validation results indicated that our model achieved higher accuracy compared with a deep neural network model without background biological network information. Experiments conducted in patient groups by cancer type showed improvement in ROC-area under the curve for many groups. Visualizations of high accuracy cancer types identified contributing genes and pathways by enrichment analysis. Known biomarkers and novel biomarker candidates were identified through these experiments.