Large Language Models Meet Virtual Cell: A Survey
It provides a comprehensive review for researchers in computational biology, but it is incremental as it synthesizes existing work without new experimental results.
This survey tackles the integration of large language models (LLMs) into virtual cell modeling in cellular biology, proposing a unified taxonomy and reviewing core tasks, models, datasets, and challenges.
Large language models (LLMs) are transforming cellular biology by enabling the development of "virtual cells"--computational systems that represent, predict, and reason about cellular states and behaviors. This work provides a comprehensive review of LLMs for virtual cell modeling. We propose a unified taxonomy that organizes existing methods into two paradigms: LLMs as Oracles, for direct cellular modeling, and LLMs as Agents, for orchestrating complex scientific tasks. We identify three core tasks--cellular representation, perturbation prediction, and gene regulation inference--and review their associated models, datasets, evaluation benchmarks, as well as the critical challenges in scalability, generalizability, and interpretability.