CVMMIVNov 24, 2019

A Proposal-based Approach for Activity Image-to-Video Retrieval

arXiv:1911.10531v120 citations
Originality Incremental advance
AI Analysis

This is an incremental improvement for video retrieval systems, focusing on filtering background noise in activity proposals.

The paper tackles the problem of activity image-to-video retrieval by addressing noisy activity proposals in videos, proposing an APIVR approach with a Graph Multi-Instance Learning module and geometry-aware triplet loss, achieving effectiveness verified on three datasets.

Activity image-to-video retrieval task aims to retrieve videos containing the similar activity as the query image, which is a challenging task because videos generally have many background segments irrelevant to the activity. In this paper, we utilize R-C3D model to represent a video by a bag of activity proposals, which can filter out background segments to some extent. However, there are still noisy proposals in each bag. Thus, we propose an Activity Proposal-based Image-to-Video Retrieval (APIVR) approach, which incorporates multi-instance learning into cross-modal retrieval framework to address the proposal noise issue. Specifically, we propose a Graph Multi-Instance Learning (GMIL) module with graph convolutional layer, and integrate this module with classification loss, adversarial loss, and triplet loss in our cross-modal retrieval framework. Moreover, we propose geometry-aware triplet loss based on point-to-subspace distance to preserve the structural information of activity proposals. Extensive experiments on three widely-used datasets verify the effectiveness of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes