HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking
This work addresses a specific bottleneck in RAG systems for AI researchers and practitioners, offering incremental improvements in evaluation and chunking methods.
The paper tackles the lack of effective evaluation tools for document chunking in Retrieval-Augmented Generation (RAG) systems by proposing HiCBench, a benchmark with annotated chunking points and evidence-dense QA pairs, and HiChunk, a framework that improves chunking quality and enhances RAG performance within reasonable time consumption.
Retrieval-Augmented Generation (RAG) enhances the response capabilities of language models by integrating external knowledge sources. However, document chunking as an important part of RAG system often lacks effective evaluation tools. This paper first analyzes why existing RAG evaluation benchmarks are inadequate for assessing document chunking quality, specifically due to evidence sparsity. Based on this conclusion, we propose HiCBench, which includes manually annotated multi-level document chunking points, synthesized evidence-dense quetion answer(QA) pairs, and their corresponding evidence sources. Additionally, we introduce the HiChunk framework, a multi-level document structuring framework based on fine-tuned LLMs, combined with the Auto-Merge retrieval algorithm to improve retrieval quality. Experiments demonstrate that HiCBench effectively evaluates the impact of different chunking methods across the entire RAG pipeline. Moreover, HiChunk achieves better chunking quality within reasonable time consumption, thereby enhancing the overall performance of RAG systems.