IR AIMar 29, 2025

MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation

Jeongsoo Lee, Daeyong Kwon, Kyohoon Jin, Junnyeong Jeong, Minwoo Sim, Minwoo Kim

arXiv:2504.08756v210.34 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work addresses the need for robust benchmarking in RAG systems by providing difficulty-controlled datasets, though it is incremental as it builds on existing dataset synthesis methods.

The paper tackles the problem of unreliable RAG evaluations due to overlooked query difficulty by proposing MHTS, a framework for generating QA datasets with controlled multi-hop reasoning complexity, which shows a strong correlation with RAG performance metrics.

Existing RAG benchmarks often overlook query difficulty, leading to inflated performance on simpler questions and unreliable evaluations. A robust benchmark dataset must satisfy three key criteria: quality, diversity, and difficulty, which capturing the complexity of reasoning based on hops and the distribution of supporting evidence. In this paper, we propose MHTS (Multi-Hop Tree Structure), a novel dataset synthesis framework that systematically controls multi-hop reasoning complexity by leveraging a multi-hop tree structure to generate logically connected, multi-chunk queries. Our fine-grained difficulty estimation formula exhibits a strong correlation with the overall performance metrics of a RAG system, validating its effectiveness in assessing both retrieval and answer generation capabilities. By ensuring high-quality, diverse, and difficulty-controlled queries, our approach enhances RAG evaluation and benchmarking capabilities.

View on arXiv PDF

Similar