LG AIJan 21, 2025

Large Language Models Meet Graph Neural Networks for Text-Numeric Graph Reasoning

Haoran Song, Jiarui Feng, Guangfu Li, Michael Province, Philip Payne, Yixin Chen, Fuhai Li

arXiv:2501.16361v14.11 citationsh-index: 5

Originality Incremental advance

AI Analysis

This work addresses the challenge of noisy data analysis in scientific domains by combining human-understandable text with numeric values for graph reasoning, though it appears incremental as it builds on existing LLM and GNN methods.

The paper tackles the problem of scientific discovery by integrating text and numeric data in a new graph structure called text-numeric graphs (TNGs), using large language models and graph neural networks to improve classification accuracy and network inference in tasks like key entity and signaling pathway mining.

In real-world scientific discovery, human beings always make use of the accumulated prior knowledge with imagination pick select one or a few most promising hypotheses from large and noisy data analysis results. In this study, we introduce a new type of graph structure, the text-numeric graph (TNG), which is defined as graph entities and associations have both text-attributed information and numeric information. The TNG is an ideal data structure model for novel scientific discovery via graph reasoning because it integrates human-understandable textual annotations or prior knowledge, with numeric values that represent the observed or activation levels of graph entities or associations in different samples. Together both the textual information and numeric values determine the importance of graph entities and associations in graph reasoning for novel scientific knowledge discovery. We further propose integrating large language models (LLMs) and graph neural networks (GNNs) to analyze the TNGs for graph understanding and reasoning. To demonstrate the utility, we generated the text-omic(numeric) signaling graphs (TOSG), as one type of TNGs, in which all graphs have the same entities, associations and annotations, but have sample-specific entity numeric (omic) values using single cell RNAseq (scRNAseq) datasets of different diseases. We proposed joint LLM-GNN models for key entity mining and signaling pathway mining on the TOSGs. The evaluation results showed the LLM-GNN and TNGs models significantly improve classification accuracy and network inference. In conclusion, the TNGs and joint LLM-GNN models are important approaches for scientific discovery.

View on arXiv PDF

Similar