BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning
This work addresses the challenge of applying LLMs to biological pathway reasoning, which is crucial for tasks like hypothesis formulation and experiment design, representing an incremental advancement in domain-specific AI applications.
The paper tackles the problem of evaluating and improving large language models (LLMs) for reasoning about complex biological pathways, showing that current methods struggle, especially in perturbed systems, and proposes a new agent-based approach that enhances performance.
The applications of large language models (LLMs) in various biological domains have been explored recently, but their reasoning ability in complex biological systems, such as pathways, remains underexplored, which is crucial for predicting biological phenomena, formulating hypotheses, and designing experiments. This work explores the potential of LLMs in pathway reasoning. We introduce BioMaze, a dataset with 5.1K complex pathway problems derived from real research, covering various biological contexts including natural dynamic changes, disturbances, additional intervention conditions, and multi-scale research targets. Our evaluation of methods such as CoT and graph-augmented reasoning, shows that LLMs struggle with pathway reasoning, especially in perturbed systems. To address this, we propose PathSeeker, an LLM agent that enhances reasoning through interactive subgraph-based navigation, enabling a more effective approach to handling the complexities of biological systems in a scientifically aligned manner. The dataset and code are available at https://github.com/zhao-ht/BioMaze.