GPTON: Generative Pre-trained Transformers enhanced with Ontology Narration for accurate annotation of biological data
This work addresses the challenge of integrating structured knowledge into large language models for biomedical research, specifically for gene set annotation, with potential broader applications.
The researchers tackled the problem of accurate annotation of biological data by developing GPTON, which enhances generative pre-trained transformers with ontology narration, achieving accurate annotations for over 68% of gene sets in top predictions.
By leveraging GPT-4 for ontology narration, we developed GPTON to infuse structured knowledge into LLMs through verbalized ontology terms, achieving accurate text and ontology annotations for over 68% of gene sets in the top five predictions. Manual evaluations confirm GPTON's robustness, highlighting its potential to harness LLMs and structured knowledge to significantly advance biomedical research beyond gene set annotation.