CV MM IVNov 24, 2019

A Proposal-based Approach for Activity Image-to-Video Retrieval

Ruicong Xu, Li Niu, Jianfu Zhang, Liqing Zhang

arXiv:1911.10531v14.120 citations

Originality Incremental advance

AI Analysis

This is an incremental improvement for video retrieval systems, focusing on filtering background noise in activity proposals.

The paper tackles the problem of activity image-to-video retrieval by addressing noisy activity proposals in videos, proposing an APIVR approach with a Graph Multi-Instance Learning module and geometry-aware triplet loss, achieving effectiveness verified on three datasets.

Activity image-to-video retrieval task aims to retrieve videos containing the similar activity as the query image, which is a challenging task because videos generally have many background segments irrelevant to the activity. In this paper, we utilize R-C3D model to represent a video by a bag of activity proposals, which can filter out background segments to some extent. However, there are still noisy proposals in each bag. Thus, we propose an Activity Proposal-based Image-to-Video Retrieval (APIVR) approach, which incorporates multi-instance learning into cross-modal retrieval framework to address the proposal noise issue. Specifically, we propose a Graph Multi-Instance Learning (GMIL) module with graph convolutional layer, and integrate this module with classification loss, adversarial loss, and triplet loss in our cross-modal retrieval framework. Moreover, we propose geometry-aware triplet loss based on point-to-subspace distance to preserve the structural information of activity proposals. Extensive experiments on three widely-used datasets verify the effectiveness of our approach.

View on arXiv PDF

Similar