MRAG-Suite: A Diagnostic Evaluation Platform for Visual Retrieval-Augmented Generation
This addresses evaluation gaps for Visual RAG systems, which are crucial for improving question-answering applications, though it is incremental as it builds on existing benchmarks and methods.
The researchers tackled the problem of inadequate evaluation for Visual Retrieval-Augmented Generation (RAG) systems by developing MRAG-Suite, a diagnostic platform that integrates multiple benchmarks and introduces filtering strategies for query difficulty and ambiguity. Their results showed substantial accuracy reductions under challenging queries, highlighting hallucinations, and their MM-RAGChecker tool effectively diagnosed these issues.
Multimodal Retrieval-Augmented Generation (Visual RAG) significantly advances question answering by integrating visual and textual evidence. Yet, current evaluations fail to systematically account for query difficulty and ambiguity. We propose MRAG-Suite, a diagnostic evaluation platform integrating diverse multimodal benchmarks (WebQA, Chart-RAG, Visual-RAG, MRAG-Bench). We introduce difficulty-based and ambiguity-aware filtering strategies, alongside MM-RAGChecker, a claim-level diagnostic tool. Our results demonstrate substantial accuracy reductions under difficult and ambiguous queries, highlighting prevalent hallucinations. MM-RAGChecker effectively diagnoses these issues, guiding future improvements in Visual RAG systems.