CVJan 14

See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval

arXiv:2601.09350v1h-index: 1
Originality Highly original
AI Analysis

This work addresses memory efficiency for video moment retrieval, which is crucial for handling lengthy videos without information loss, representing a strong specific gain in the domain.

The paper tackled the problem of memory constraints in video moment retrieval by proposing the SMORE framework, which enhances memory efficiency while maintaining high information resolution, achieving state-of-the-art performance on benchmarks like QVHighlights, Charades-STA, and ActivityNet-Captions.

Recent advances in Multimodal Large Language Models (MLLMs) have improved image recognition and reasoning, but video-related tasks remain challenging due to memory constraints from dense frame processing. Existing Video Moment Retrieval (VMR) methodologies rely on sparse frame sampling, risking potential information loss, especially in lengthy videos. We propose SMORE (See MORE, store less), a framework that enhances memory efficiency while maintaining high information resolution. SMORE (1) uses query-guided captions to encode semantics aligned with user intent, (2) applies query-aware importance modulation to highlight relevant segments, and (3) adaptively compresses frames to preserve key content while reducing redundancy. This enables efficient video understanding without exceeding memory budgets. Experimental validation reveals that SMORE achieves state-of-the-art performance on QVHighlights, Charades-STA, and ActivityNet-Captions benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes