GN AI LGFeb 11, 2024

Highly Accurate Disease Diagnosis and Highly Reproducible Biomarker Identification with PathFormer

Zehao Dong, Qihang Zhao, Philip R. O. Payne, Michael A Province, Carlos Cruchaga, Muhan Zhang, Tianyu Zhao, Yixin Chen, Fuhai Li

arXiv:2402.07268v15.99 citationsh-index: 8Res Sq

Originality Incremental advance

AI Analysis

This addresses the need for more reliable disease diagnosis and biomarker discovery in biomedical research, though it is incremental as it builds on existing GNN methods.

The authors tackled the problem of limited accuracy and reproducibility in disease diagnosis and biomarker identification using graph neural networks on omics data, achieving a 30% accuracy improvement in disease diagnosis compared to existing GNN models and high reproducibility across datasets.

Biomarker identification is critical for precise disease diagnosis and understanding disease pathogenesis in omics data analysis, like using fold change and regression analysis. Graph neural networks (GNNs) have been the dominant deep learning model for analyzing graph-structured data. However, we found two major limitations of existing GNNs in omics data analysis, i.e., limited-prediction (diagnosis) accuracy and limited-reproducible biomarker identification capacity across multiple datasets. The root of the challenges is the unique graph structure of biological signaling pathways, which consists of a large number of targets and intensive and complex signaling interactions among these targets. To resolve these two challenges, in this study, we presented a novel GNN model architecture, named PathFormer, which systematically integrate signaling network, priori knowledge and omics data to rank biomarkers and predict disease diagnosis. In the comparison results, PathFormer outperformed existing GNN models significantly in terms of highly accurate prediction capability ( 30% accuracy improvement in disease diagnosis compared with existing GNN models) and high reproducibility of biomarker ranking across different datasets. The improvement was confirmed using two independent Alzheimer's Disease (AD) and cancer transcriptomic datasets. The PathFormer model can be directly applied to other omics data analysis studies.

View on arXiv PDF

Similar