Dual-Route Top-K Retrieval with 1v1 VLM Reranking for the CoVR-R
This work addresses the composed video retrieval task for the CoVR-R challenge, showing that decoupling recall and selection outperforms broad reranking or direct VLM classification.
The authors propose a dual-route retrieval and conservative 1v1 VLM reranking method for composed video retrieval, achieving 95.28 R@1 on the CoVR-R hidden test split.
We describe \emph{Dual-Route Top-K Retrieval with 1v1 VLM Reranking} for the CoVR-R challenge. The method treats composed video retrieval as two coupled problems: finding a sufficiently complete top-k candidate set, and then safely deciding whether any candidate should replace a strong current top-1. We first improve the reasoning/text seed with a VLM slot selector over existing candidates, without introducing DFN visual retrieval. We then add a visual route from contact-sheet embeddings using DFN-H/DFN-L. The routes are merged into a top-10 candidate set, after which a VLM final reranker performs conservative 1v1 comparisons between the current top-1 and each challenger. On the hidden test split, the final system reaches 95.28 R@1, 97.47 R@5, 98.48 R@10, and 99.66 R@50. The main lesson is that CoVR-R benefits more from recall-selection decoupling than from broad text reranking or direct multi-candidate VLM classification.