IRMay 27

Subtraction Gets You More: Gap-Aware Retrieval for Multimodal Multi-Hop QA

arXiv:2605.2864167.9

AI Analysis

For multimodal QA systems, this method improves retrieval accuracy by breaking entity-centric redundancy, though it is an incremental improvement over existing iterative retrieval frameworks.

GRAIL introduces a gap-aware retrieval paradigm that addresses semantic anchoring in multimodal multi-hop QA, achieving a 40.3% macro-averaged performance gain on MultimodalQA.

In multimodal multi-hop question answering, we focus on the initial retrieval stage via two distinct tasks: (1) evidence set completion, retrieving missing evidence given context, and (2) sequential pool construction, iteratively building the top-$K$ pool from the scratch. Under these settings, we point out that conventional iterative retrieval frameworks often suffer from Semantic Anchoring, where previously fetched evidence traps the retriever and yields entity-centric redundancy. To break this trap, we propose GRAIL (Gap-aware Retrieval via Adaptive Implicit Localization), a paradigm that performs implicit query rewriting directly at the embedding level. By context-subtractive query steering, GRAIL excels at compositional cross-modal reasoning, while additive embedding updates show strength on localized information aggregation. By dynamically routing queries based on task type, our Hybrid Framework achieves a 40.3\% macro-averaged performance gain on MultimodalQA. Extensive evaluations demonstrate that sequential GRAIL retrieves in a superior, noise-resilient manner, significantly expanding the search horizon through iterative gap-aware optimization.

View on arXiv PDF

Similar