CVMay 31

R^3: Composed Video Retrieval via Reasoning-Guided Recalling and Re-ranking

arXiv:2606.0111388.9Has Code
AI Analysis

For researchers in video retrieval, this work addresses the under-expression of target-side consequences in composed video retrieval, offering a zero-shot pipeline that improves recall and re-ranking.

The paper tackles composed video retrieval, where a target video is retrieved given a reference video and an edit instruction. The proposed R^3 pipeline uses reasoning-guided recalling and re-ranking, achieving improved retrieval performance over baselines on the CoVR-R challenge.

The CoVR-R challenge evaluates composed video retrieval, where a system must retrieve a target video from a large gallery given a reference video and a textual edit instruction. This setting is not a standard video-text retrieval problem: the query is defined by both the visual evidence in the source video and the transformation implied by the edit. A strong embedding model can provide scalable candidate recall, but it may under-express target-side consequences such as state changes, action replacement, object preservation, or temporal consistency. A pairwise multimodal reranker can verify such details more directly, but exhaustive reranking over the full gallery is computationally infeasible. We present $\mathbb{R}^3$, a zero-shot composed video retrieval pipeline built around Reasoning-guided Recalling and Reranking. The core idea is to turn the source-edit query into a reasoning-grounded retrieval program rather than treating the edit text as a short caption. First, the model generates a reasoning trace that describes the expected target video after applying the edit. Then the trace is encoded together with the source video as a reasoning-augmented query, and its retrieval score is fused with the base composed query through an agreement-gated residual rule. At last, a re-ranker verifies the recalled candidates with direct source-candidate comparison. Experiments have demonstrated the effectiveness of our method in addressing this challenge. Codes are available on https://github.com/Lee-zixu/R-3.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes