LG AIApr 28, 2025

Tripartite-GraphRAG via Plugin Ontologies

arXiv:2504.19667v3

Originality Incremental advance

AI Analysis

This addresses the problem of LLM hallucinations and lack of provenance in knowledge-intensive tasks for domains such as industrial automation and healthcare, though it appears incremental as it builds on existing GraphRAG methods.

The paper tackles the challenge of creating knowledge graphs for LLMs to improve factual accuracy in domains like healthcare by proposing a tripartite graph approach that connects domain-specific objects via ontologies to text chunks, resulting in optimized LLM prompts with reduced lengths and potential cost savings.

Large Language Models (LLMs) have shown remarkable capabilities across various domains, yet they struggle with knowledge-intensive tasks in areas that demand factual accuracy, e.g. industrial automation and healthcare. Key limitations include their tendency to hallucinate, lack of source traceability (provenance), and challenges in timely knowledge updates. Combining language models with knowledge graphs (GraphRAG) offers promising avenues for overcoming these deficits. However, a major challenge lies in creating such a knowledge graph in the first place. Here, we propose a novel approach that combines LLMs with a tripartite knowledge graph representation, which is constructed by connecting complex, domain-specific objects via a curated ontology of corresponding, domain-specific concepts to relevant sections within chunks of text through a concept-anchored pre-analysis of source documents starting from an initial lexical graph. Subsequently, we formulate LLM prompt creation as an unsupervised node classification problem allowing for the optimization of information density, coverage, and arrangement of LLM prompts at significantly reduced lengths. An initial experimental evaluation of our approach on a healthcare use case, involving multi-faceted analyses of patient anamneses given a set of medical concepts as well as a series of clinical guideline literature, indicates its potential to optimize information density, coverage, and arrangement of LLM prompts while significantly reducing their lengths, which, in turn, may lead to reduced costs as well as more consistent and reliable LLM outputs.

View on arXiv PDF

Similar