LEGO-GraphRAG: Modularizing Graph-based Retrieval-Augmented Generation for Design Space Exploration
This work addresses the problem of improving GraphRAG systems for researchers and practitioners in databases and natural language processing by providing a modular framework, though it is incremental as it builds on existing GraphRAG concepts.
The paper tackled the lack of modular workflow analysis and systematic frameworks in GraphRAG by proposing LEGO-GraphRAG, a modular framework that enables fine-grained decomposition, classification of techniques, and creation of new instances, resulting in insights into balancing reasoning quality, runtime efficiency, and costs from empirical studies on large-scale real-world graphs and diverse queries.
GraphRAG integrates (knowledge) graphs with large language models (LLMs) to improve reasoning accuracy and contextual relevance. Despite its promising applications and strong relevance to multiple research communities, such as databases and natural language processing, GraphRAG currently lacks modular workflow analysis, systematic solution frameworks, and insightful empirical studies. To bridge these gaps, we propose LEGO-GraphRAG, a modular framework that enables: 1) fine-grained decomposition of the GraphRAG workflow, 2) systematic classification of existing techniques and implemented GraphRAG instances, and 3) creation of new GraphRAG instances. Our framework facilitates comprehensive empirical studies of GraphRAG on large-scale real-world graphs and diverse query sets, revealing insights into balancing reasoning quality, runtime efficiency, and token or GPU cost, that are essential for building advanced GraphRAG systems.