CVMay 26

Q-GeoMem: Question-Guided Geometric Memory for Video Spatial Reasoning

arXiv:2605.2731881.8
Predicted impact top 26% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the problem of efficient long-range spatial reasoning in video-language models, which is important for applications like autonomous driving and robotics.

Q-GeoMem introduces a question-guided geometric memory framework for video spatial reasoning, achieving state-of-the-art performance on VSI-Bench and VSTI-Bench by using two complementary memories and a scoring mechanism that filters redundant geometry based on question relevance and novelty.

Video spatial reasoning requires accumulating viewpoint-dependent evidence over time while retaining information useful to the question being asked. Existing spatial video-language models improve geometric perception and long-range context modeling, but often treat memory as a generic temporal cache, which can introduce redundant or irrelevant geometry and weaken long-horizon reasoning. We propose \textbf{\ours}, a question-guided geometric memory framework for video spatial reasoning. \ours injects camera-conditioned geometry into visual tokens and maintains two complementary memories: a Fine-Grained Context Bank for recent dense features and camera states, and a Semantic-Geometric Evidence Bank for compact long-range evidence. Each candidate frame is scored by the product of Q-Former-based question relevance and novelty with respect to the retained bank; this score is stored and reused during reading, while a capacity-based replacement rule keeps the bank compact. During reasoning, both memories are read before update and adaptively fused with the current frame representation. Experiments on VSI-Bench and VSTI-Bench show that \ours achieves state-of-the-art performance among evaluated spatial reasoning models, validating the effectiveness of question-guided geometric memory. Ablations further verify the contribution of the proposed evidence scoring mechanism.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes