CVAICLOct 11, 2021

ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation

arXiv:2110.05146v2
Originality Synthesis-oriented
AI Analysis

This work addresses video-text retrieval for applications like media analytics and surveillance, but it is incremental as it builds on existing tasks and datasets.

The paper tackled video-text retrieval by leveraging a model trained only on video retrieval to jointly handle video retrieval and video corpus moment retrieval, achieving first place in the ICCV VALUE Challenge 2021 and setting new state-of-the-art performance on four datasets.

Video-text retrieval has many real-world applications such as media analytics, surveillance, and robotics. This paper presents the 1st place solution to the video retrieval track of the ICCV VALUE Challenge 2021. We present a simple yet effective approach to jointly tackle two video-text retrieval tasks (video retrieval and video corpus moment retrieval) by leveraging the model trained only on the video retrieval task. In addition, we create an ensemble model that achieves the new state-of-the-art performance on all four datasets (TVr, How2r, YouCook2r, and VATEXr) presented in the VALUE Challenge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes