IRMay 11

ReCoVR: Closing the Loop in Interactive Composed Video Retrieval

arXiv:2605.0983690.1
Predicted impact top 8% in IR · last 90 daysOriginality Highly original
AI Analysis

This work addresses the lack of multi-turn interaction in composed video retrieval, enabling progressive refinement for real-world visual search.

ReCoVR introduces a multi-turn interactive composed video retrieval system that uses a dual-pathway architecture with reflexive perception to monitor and correct retrieval trajectories, achieving 74.30% R@1 after one interactive round on WebVid-CoVR-Test.

Composed video retrieval (CoVR) searches for target videos using a reference video and a modification text, but existing methods are restricted to a single interaction round and cannot support the progressive nature of real-world visual search. To bridge this gap, we first formalize interactive composed video retrieval, a multi-turn extension of CoVR, where users progressively refine their search intent through natural-language feedback across turns. Adapting existing interactive retrieval methods to this setting reveals two structural weaknesses: reliance on a single retrieval channel and an open-loop retrieval design that consumes user feedback but does not diagnose whether its own retrieval trajectory is drifting or stagnating. To address these limitations, we propose ReCoVR (Reflexive Composed Video Retrieval), a dual-pathway architecture built on reflexive perception, where the system treats its retrieval history as diagnostic evidence alongside user feedback. Specifically, an Intent Pathway routes heterogeneous feedback to complementary retrieval channels, while a Reflection Pathway performs trajectory-level reflection to monitor result evolution and correct retrieval errors across turns. Experiments on multiple benchmarks show that ReCoVR consistently outperforms interactive baselines, notably achieving 74.30% R@1 after just one interactive round on the WebVid-CoVR-Test dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes