ReCellTy: Domain-specific knowledge graph retrieval-augmented LLMs workflow for single-cell annotation
This addresses the problem of precise and automated cell type annotation for researchers in single-cell biology, representing a domain-specific incremental improvement.
The paper tackled automated cell type annotation in single-cell analysis by developing a domain-specific knowledge graph retrieval-augmented LLM workflow, which improved human evaluation scores by up to 0.21 and semantic similarity by 6.1% across 11 tissue types compared to general-purpose LLMs.
To enable precise and fully automated cell type annotation with large language models (LLMs), we developed a graph structured feature marker database to retrieve entities linked to differential genes for cell reconstruction. We further designed a multi task workflow to optimize the annotation process. Compared to general purpose LLMs, our method improves human evaluation scores by up to 0.21 and semantic similarity by 6.1% across 11 tissue types, while more closely aligning with the cognitive logic of manual annotation.