TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries
This work addresses the practical need for ranked video moment search in multi-modality applications, though it is incremental as it builds on existing datasets and tasks.
The authors introduced the Ranked Video Moment Retrieval (RVMR) task to locate ranked lists of video moments using natural language queries, and created the TVR-Ranking dataset with 94,442 manually annotated query-moment pairs to facilitate research in this area.
In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we develop the TVR-Ranking dataset, based on the raw videos and existing moment annotations provided in the TVR dataset. Our key contribution is the manual annotation of relevance levels for 94,442 query-moment pairs. We then develop the $NDCG@K, IoU\geq μ$ evaluation metric for this new task and conduct experiments to evaluate three baseline models. Our experiments show that the new RVMR task brings new challenges to existing models and we believe this new dataset contributes to the research on multi-modality search. The dataset is available at \url{https://github.com/Ranking-VMR/TVR-Ranking}