LiteSemRAG: Lightweight LLM-Free Semantic-Aware Graph Retrieval for Robust RAG
This addresses the efficiency bottleneck for users of graph-based RAG systems, offering a competitive alternative to LLM-dependent methods, though it appears incremental as it builds on existing graph RAG concepts.
The paper tackles the problem of high computational cost and latency in graph-based Retrieval-Augmented Generation (RAG) systems by proposing LiteSemRAG, a lightweight LLM-free framework that constructs semantic graphs using token-level embeddings and dynamic node mechanisms. The results show it achieves the best mean reciprocal rank across three benchmark datasets with zero LLM token consumption and substantial efficiency improvements.
Graph-based Retrieval-Augmented Generation (RAG) has shown great potential for improving multi-level reasoning and structured evidence aggregation. However, existing graph-based RAG frameworks heavily rely on exploiting large language models (LLMs) during indexing and querying, leading to high token consumption, computational cost and latency overhead. In this paper, we propose LiteSemRAG, a lightweight, fully LLM-free, semantic-aware graph retrieval framework. LiteSemRAG constructs a heterogeneous semantic graph by exploiting contextual token-level embeddings, explicitly separating surface lexical representations from context-dependent semantic meanings. To robustly model polysemy, we introduce a dynamic semantic node construction mechanism with chunk-level context aggregation and adaptive anomaly handling. At query stage, LiteSemRAG performs a two-step semantic-aware retrieval process that integrates co-occurrence graph weighting with an isolated semantic recovery mechanism, enabling balanced structural reasoning and semantic coverage. We evaluate LiteSemRAG on three benchmark datasets and experimental results show that LiteSemRAG achieves the best mean reciprocal rank (MRR@10) across all datasets and competitive or superior recall rate (Recall@10) compared to state-of-the-art LLM-based graph RAG systems. Meanwhile, LiteSemRAG consumes zero LLM tokens and achieves substantial efficiency improvements in both indexing and querying due to the elimination of LLM usage. These results demonstrate the effectiveness of LiteSemRAG, indicating that a strong semantic-aware graph retrieval framework can be achieved without relying on LLM-based approaches.