IRCVFeb 17, 2025

REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark

arXiv:2502.12342v137 citationsh-index: 10ACL
Originality Incremental advance
AI Analysis

This work addresses the need for better evaluation and improvement of retrieval in multi-modal RAG systems, particularly for real-world applications, though it is incremental as it builds on existing RAG frameworks.

The authors tackled the problem of evaluating multi-modal document retrieval for RAG systems by introducing REAL-MM-RAG, a benchmark that captures real-world challenges like multi-modal documents and query rephrasing, which revealed significant model weaknesses in handling table-heavy documents and robustness to rephrasing. They improved retrieval performance by curating training datasets and fine-tuning models, achieving state-of-the-art results on their benchmark.

Accurate multi-modal document retrieval is crucial for Retrieval-Augmented Generation (RAG), yet existing benchmarks do not fully capture real-world challenges with their current design. We introduce REAL-MM-RAG, an automatically generated benchmark designed to address four key properties essential for real-world retrieval: (i) multi-modal documents, (ii) enhanced difficulty, (iii) Realistic-RAG queries and (iv) accurate labeling. Additionally, we propose a multi-difficulty-level scheme based on query rephrasing to evaluate models' semantic understanding beyond keyword matching. Our benchmark reveals significant model weaknesses, particularly in handling table-heavy documents and robustness to query rephrasing. To mitigate these shortcomings, we curate a rephrased training set and introduce a new finance-focused, table-heavy dataset. Fine-tuning on these datasets enables models to achieve state-of-the-art retrieval performance on REAL-MM-RAG benchmark. Our work offers a better way to evaluate and improve retrieval in multi-modal RAG systems while also providing training data and models that address current limitations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes