IRCLMar 14

The Reasoning Bottleneck in Graph-RAG: Structured Prompting and Context Compression for Multi-Hop QA

arXiv:2603.1404565.31 citationsh-index: 23
AI Analysis

This addresses the reasoning gap in Graph-RAG for multi-hop QA, offering cost-effective improvements that transfer across systems, though it is incremental as it builds on existing Graph-RAG frameworks.

The paper tackled the reasoning bottleneck in Graph-RAG systems for multi-hop question answering, showing that while retrieval often includes the gold answer (77-91% of questions), accuracy is low (35-78%) due to reasoning failures (73-84% of errors). It proposed SPARQL chain-of-thought prompting and graph-walk compression, improving accuracy by up to +14 percentage points and enabling an augmented Llama-8B model to match or exceed an unaugmented Llama-70B baseline at ~12x lower cost.

Graph-RAG systems achieve strong multi-hop question answering by indexing documents into knowledge graphs, but strong retrieval does not guarantee strong answers. Evaluating KET-RAG, a leading Graph-RAG system, on three multi-hop QA benchmarks (HotpotQA, MuSiQue, 2WikiMultiHopQA), we find that 77% to 91% of questions have the gold answer in the retrieved context, yet accuracy is only 35% to 78%, and 73% to 84% of errors are reasoning failures. We propose two augmentations: (i) SPARQL chain-of-thought prompting, which decomposes questions into triple-pattern queries aligned with the entity-relationship context, and (ii) graph-walk compression, which compresses the context by ~60% via knowledge-graph traversal with no LLM calls. SPARQL CoT improves accuracy by +2 to +14 pp; graph-walk compression adds +6 pp on average when paired with structured prompting on smaller models. Surprisingly, we show that, with question-type routing, a fully augmented budget open-weight Llama-8B model matches or exceeds the unaugmented Llama-70B baseline on all three benchmarks at ~12x lower cost. A replication on LightRAG confirms that our augmentations transfer across Graph-RAG systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes