AICLCVAug 7, 2025

QA-Dragon: Query-Aware Dynamic RAG System for Knowledge-Intensive Visual Question Answering

arXiv:2508.05197v13 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses limitations in existing RAG methods for multimodal VQA, offering incremental improvements in handling multi-hop reasoning and up-to-date knowledge.

The paper tackles the problem of complex queries in knowledge-intensive Visual Question Answering (VQA) by proposing QA-Dragon, a query-aware dynamic RAG system that enhances reasoning performance, achieving improvements of 5.06% to 6.35% over baselines on various tasks.

Retrieval-Augmented Generation (RAG) has been introduced to mitigate hallucinations in Multimodal Large Language Models (MLLMs) by incorporating external knowledge into the generation process, and it has become a widely adopted approach for knowledge-intensive Visual Question Answering (VQA). However, existing RAG methods typically retrieve from either text or images in isolation, limiting their ability to address complex queries that require multi-hop reasoning or up-to-date factual knowledge. To address this limitation, we propose QA-Dragon, a Query-Aware Dynamic RAG System for Knowledge-Intensive VQA. Specifically, QA-Dragon introduces a domain router to identify the query's subject domain for domain-specific reasoning, along with a search router that dynamically selects optimal retrieval strategies. By orchestrating both text and image search agents in a hybrid setup, our system supports multimodal, multi-turn, and multi-hop reasoning, enabling it to tackle complex VQA tasks effectively. We evaluate our QA-Dragon on the Meta CRAG-MM Challenge at KDD Cup 2025, where it significantly enhances the reasoning performance of base models under challenging scenarios. Our framework achieves substantial improvements in both answer accuracy and knowledge overlap scores, outperforming baselines by 5.06% on the single-source task, 6.35% on the multi-source task, and 5.03% on the multi-turn task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes