LG AIFeb 24, 2025

IGDA: Interactive Graph Discovery through Large Language Model Agents

Alex Havrilla, David Alvarez-Melis, Nicolo Fusi

HarvardMicrosoft

arXiv:2502.17189v214.45 citationsh-index: 21

Originality Incremental advance

AI Analysis

This provides a novel LLM-based approach for graph discovery, complementing numerical methods, with potential applications in domains like biology, though it is incremental in combining existing LLM capabilities.

The paper tackles interactive graph discovery by using large language models (LLMs) to predict variable relationships from semantic metadata, proposing IGDA with uncertainty-driven edge selection and local graph updates, and shows it often outperforms baselines on eight real-world graphs, including a new causal graph where memorization is impossible.

Large language models ($\textbf{LLMs}$) have emerged as a powerful method for discovery. Instead of utilizing numerical data, LLMs utilize associated variable $\textit{semantic metadata}$ to predict variable relationships. Simultaneously, LLMs demonstrate impressive abilities to act as black-box optimizers when given an objective $f$ and sequence of trials. We study LLMs at the intersection of these two capabilities by applying LLMs to the task of $\textit{interactive graph discovery}$: given a ground truth graph $G^*$ capturing variable relationships and a budget of $I$ edge experiments over $R$ rounds, minimize the distance between the predicted graph $\hat{G}_R$ and $G^*$ at the end of the $R$-th round. To solve this task we propose $\textbf{IGDA}$, a LLM-based pipeline incorporating two key components: 1) an LLM uncertainty-driven method for edge experiment selection 2) a local graph update strategy utilizing binary feedback from experiments to improve predictions for unselected neighboring edges. Experiments on eight different real-world graphs show our approach often outperforms all baselines including a state-of-the-art numerical method for interactive graph discovery. Further, we conduct a rigorous series of ablations dissecting the impact of each pipeline component. Finally, to assess the impact of memorization, we apply our interactive graph discovery strategy to a complex, new (as of July 2024) causal graph on protein transcription factors, finding strong performance in a setting where memorization is impossible. Overall, our results show IGDA to be a powerful method for graph discovery complementary to existing numerically driven approaches.

View on arXiv PDF

Similar