AIOct 18, 2025

Can Knowledge-Graph-based Retrieval Augmented Generation Really Retrieve What You Need?

arXiv:2510.16582v14 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses a bottleneck in enhancing large language models with structured knowledge for real-world applications, though it appears incremental as it builds on existing KG-RAG and PRM methods.

The paper tackles the problem of knowledge-graph-based retrieval augmented generation (KG-RAG) struggling to retrieve accurate and diverse information for complex real-world queries, and proposes GraphFlow, which outperforms baselines by 10% on average in hit rate and recall on the STaRK benchmark.

Retrieval-Augmented Generation (RAG) based on knowledge graphs (KGs) enhances large language models (LLMs) by providing structured and interpretable external knowledge. However, existing KG-based RAG methods struggle to retrieve accurate and diverse information from text-rich KGs for complex real-world queries. Process Reward Models (PRMs) offer a way to align the retrieval process of KG-based RAG with query-specific knowledge requirements, but they heavily rely on process-level supervision signals that are expensive and hard to obtain on KGs. To address this challenge, we propose GraphFlow, a framework that efficiently retrieves accurate and diverse knowledge required for real-world queries from text-rich KGs. GraphFlow employs a transition-based flow matching objective to jointly optimize a retrieval policy and a flow estimator. The flow estimator factorizes the reward of the retrieval outcome into the intermediate retrieval states. Such reward factorization guides the retrieval policy to retrieve candidates from KGs in proportion to their reward. This allows GraphFlow to explore high-quality regions of KGs that yield diverse and relevant results. We evaluate GraphFlow on the STaRK benchmark, which includes real-world queries from multiple domains over text-rich KGs. GraphFlow outperforms strong KG-RAG baselines, including GPT-4o, by 10% on average in hit rate and recall. It also shows strong generalization to unseen KGs, demonstrating its effectiveness and robustness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes