CV MMAug 26, 2022

PRVR: Partially Relevant Video Retrieval

Xianke Chen, Daizong Liu, Xun Yang, Xirong Li, Jianfeng Dong, Meng Wang, Xun Wang

arXiv:2208.12510v27.37 citationsh-index: 23Has Code

Originality Incremental advance

AI Analysis

This addresses a more realistic and challenging retrieval setting for users dealing with untrimmed videos on platforms like social media, though it is incremental as it adapts existing methods to a new task.

The paper tackles the problem of text-to-video retrieval when only part of a video is relevant to a query, proposing a method that learns multi-scale similarities and demonstrates viability across three diverse datasets.

In current text-to-video retrieval (T2VR), videos to be retrieved have been properly trimmed so that a correspondence between the videos and ad-hoc textual queries naturally exists. Note in practice that videos circulated on the Internet and social media platforms, while being relatively short, are typically rich in their content. Often, multiple scenes / actions / events are shown in a single video, leading to a more challenging T2VR setting wherein only part of the video content is relevant w.r.t. a given query. This paper presents a first study on this setting which we term Partially Relevant Video Retrieval (PRVR). Considering that a video typically consists of multiple moments, a video is regarded as partially relevant w.r.t. to a given query if it contains a query-related moment. We formulate the PRVR task as a multiple instance learning problem, and propose a Multi-Scale Similarity Learning (MS-SL++) network that jointly learns both clip-scale and frame-scale similarities to determine the partial relevance between video-query pairs. Extensive experiments on three diverse video-text datasets (TVshow Retrieval, ActivityNet-Captions and Charades-STA) demonstrate the viability of the proposed method.

View on arXiv PDF Code

Similar