CLIRJan 25, 2025

Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning

arXiv:2501.15228v239 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work addresses the challenge of optimizing complex RAG pipelines for more accurate question-answering, representing an incremental improvement over prior methods that focused on simpler pipelines or did not fully handle component interdependencies.

The paper tackles the problem of misalignments between components in retrieval-augmented generation (RAG) pipelines for question-answering by proposing MMOA-RAG, a multi-agent reinforcement learning approach that harmonizes all components toward a unified reward, resulting in improved overall performance and outperforming existing baselines on various QA benchmarks.

Retrieval-augmented generation (RAG) is widely utilized to incorporate external knowledge into large language models, thereby enhancing factuality and reducing hallucinations in question-answering (QA) tasks. A standard RAG pipeline consists of several components, such as query rewriting, document retrieval, document filtering, and answer generation. However, these components are typically optimized separately through supervised fine-tuning, which can lead to misalignments between the objectives of individual components and the overarching aim of generating accurate answers. Although recent efforts have explored using reinforcement learning (RL) to optimize specific RAG components, these approaches often focus on simple pipelines with only two components or do not adequately address the complex interdependencies and collaborative interactions among the modules. To overcome these limitations, we propose treating the complex RAG pipeline with multiple components as a multi-agent cooperative task, in which each component can be regarded as an RL agent. Specifically, we present MMOA-RAG, Multi-Module joint Optimization Algorithm for RAG, which employs multi-agent reinforcement learning to harmonize all agents' goals toward a unified reward, such as the F1 score of the final answer. Experiments conducted on various QA benchmarks demonstrate that MMOA-RAG effectively boost the overall performance of the pipeline and outperforms existing baselines. Furthermore, comprehensive ablation studies validate the contributions of individual components and demonstrate MMOA-RAG can be adapted to different RAG pipelines and benchmarks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes