LLM4GRN: Discovering Causal Gene Regulatory Networks with LLMs -- Evaluation through Synthetic Data Generation
This work addresses the challenge of understanding gene regulatory networks for disease mechanism discovery and therapeutic target identification, but it is incremental as it builds on existing methods by incorporating LLMs.
The researchers tackled the problem of discovering causal gene regulatory networks (GRNs) from single-cell RNA sequencing data by investigating the use of large language models (LLMs), either alone or combined with traditional methods, and developed an evaluation strategy using synthetic data generation to address the lack of ground truth, with results showing that LLMs can support statistical modeling and data synthesis in biological research.
Gene regulatory networks (GRNs) represent the causal relationships between transcription factors (TFs) and target genes in single-cell RNA sequencing (scRNA-seq) data. Understanding these networks is crucial for uncovering disease mechanisms and identifying therapeutic targets. In this work, we investigate the potential of large language models (LLMs) for GRN discovery, leveraging their learned biological knowledge alone or in combination with traditional statistical methods. We develop a task-based evaluation strategy to address the challenge of unavailable ground truth causal graphs. Specifically, we use the GRNs suggested by LLMs to guide causal synthetic data generation and compare the resulting data against the original dataset. Our statistical and biological assessments show that LLMs can support statistical modeling and data synthesis for biological research.