CLMar 12, 2025

MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System

arXiv:2503.09600v225 citationsh-index: 15ACL
Originality Incremental advance
AI Analysis

This work addresses a crucial bottleneck in RAG systems for improving retrieval and generation accuracy, though it is incremental by building on existing chunking and LLM integration approaches.

The paper tackles the problem of text chunking in Retrieval-Augmented Generation (RAG) systems by introducing a dual-metric evaluation method and the Mixture-of-Chunkers (MoC) framework, resulting in enhanced RAG performance as demonstrated through extensive experiments.

Retrieval-Augmented Generation (RAG), while serving as a viable complement to large language models (LLMs), often overlooks the crucial aspect of text chunking within its pipeline. This paper initially introduces a dual-metric evaluation method, comprising Boundary Clarity and Chunk Stickiness, to enable the direct quantification of chunking quality. Leveraging this assessment method, we highlight the inherent limitations of traditional and semantic chunking in handling complex contextual nuances, thereby substantiating the necessity of integrating LLMs into chunking process. To address the inherent trade-off between computational efficiency and chunking precision in LLM-based approaches, we devise the granularity-aware Mixture-of-Chunkers (MoC) framework, which consists of a three-stage processing mechanism. Notably, our objective is to guide the chunker towards generating a structured list of chunking regular expressions, which are subsequently employed to extract chunks from the original text. Extensive experiments demonstrate that both our proposed metrics and the MoC framework effectively settle challenges of the chunking task, revealing the chunking kernel while enhancing the performance of the RAG system.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes