GraphSeqLM: A Unified Graph Language Framework for Omic Graph Learning
This work addresses the problem of multi-omic data integration for precision medicine, representing an incremental improvement by combining existing GNNs and LLMs.
The paper tackled the challenge of integrating high-dimensional and noisy multi-omic data for complex diseases by proposing GraphSeqLM, a framework that enhances Graph Neural Networks with biological sequence embeddings from Large Language Models, resulting in superior predictive accuracy compared to existing methods.
The integration of multi-omic data is pivotal for understanding complex diseases, but its high dimensionality and noise present significant challenges. Graph Neural Networks (GNNs) offer a robust framework for analyzing large-scale signaling pathways and protein-protein interaction networks, yet they face limitations in expressivity when capturing intricate biological relationships. To address this, we propose Graph Sequence Language Model (GraphSeqLM), a framework that enhances GNNs with biological sequence embeddings generated by Large Language Models (LLMs). These embeddings encode structural and biological properties of DNA, RNA, and proteins, augmenting GNNs with enriched features for analyzing sample-specific multi-omic data. By integrating topological, sequence-derived, and biological information, GraphSeqLM demonstrates superior predictive accuracy and outperforms existing methods, paving the way for more effective multi-omic data integration in precision medicine.