AIJun 8, 2025

BRIGHT+: Upgrading the BRIGHT Benchmark with MARCUS, a Multi-Agent RAG Clean-Up Suite

arXiv:2506.07116v13 citationsh-index: 19
Originality Incremental advance
AI Analysis

This work addresses the need for cleaner corpora in RAG systems to enhance multi-hop retrieval and reasoning, though it is incremental as it builds on the existing BRIGHT benchmark.

The paper tackles the problem of web-crawled artifacts like content redundancy and semantic discontinuity impairing retrieval accuracy in the BRIGHT benchmark, presenting MARCUS, a multi-agent pipeline that cleans and re-chunks BRIGHT into BRIGHT-Plus, resulting in consistent and significant improvements in retrieval accuracy and multi-hop reasoning across diverse retrievers.

Retrieval-Augmented Generation (RAG) systems require corpora that are both structurally clean and semantically coherent. BRIGHT is a recent and influential benchmark designed to evaluate complex multi-hop retrieval across diverse, high-reasoning domains. However, its practical effectiveness is limited by common web-crawled artifacts - such as content redundancy and semantic discontinuity - that impair retrieval accuracy and downstream reasoning. Notably, we find that such issues are concentrated in seven StackExchange-derived subdomains, while other domains (e.g., Coding and Theorem-based content) remain relatively clean. In this study, we present MARCUS, a multi-agent pipeline that leverages large language models (LLMs) to systematically clean and re-chunk BRIGHT into a higher-quality corpus: BRIGHT-Plus. MARCUS applies dedicated agents for structural noise removal and semantic segmentation, preserving answer-bearing spans while improving contextual integrity. Experimental evaluations demonstrate that BRIGHT-Plus yields consistent and significant improvements in both retrieval accuracy and multi-hop reasoning across a diverse set of retrievers. We release both the BRIGHT-Plus corpus and the MARCUS pipeline to support future research on robust, reasoning-centric retrieval.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes