CVMMAug 19, 2020

Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos

arXiv:2008.08257v178 citations
Originality Incremental advance
AI Analysis

This work addresses video moment retrieval for researchers and practitioners in computer vision, offering an incremental improvement over existing weak-supervised methods.

The paper tackles the problem of weakly-supervised video moment retrieval by addressing the limitation of existing methods that ignore intra-sample confrontment between semantically similar moments, leading to difficulty in distinguishing target moments from plausible negatives. The proposed Regularized Two-Branch Proposal Network achieves improved performance, as demonstrated through extensive experiments.

Video moment retrieval aims to localize the target moment in an video according to the given sentence. The weak-supervised setting only provides the video-level sentence annotations during training. Most existing weak-supervised methods apply a MIL-based framework to develop inter-sample confrontment, but ignore the intra-sample confrontment between moments with semantically similar contents. Thus, these methods fail to distinguish the target moment from plausible negative moments. In this paper, we propose a novel Regularized Two-Branch Proposal Network to simultaneously consider the inter-sample and intra-sample confrontments. Concretely, we first devise a language-aware filter to generate an enhanced video stream and a suppressed video stream. We then design the sharable two-branch proposal module to generate positive proposals from the enhanced stream and plausible negative proposals from the suppressed one for sufficient confrontment. Further, we apply the proposal regularization to stabilize the training process and improve model performance. The extensive experiments show the effectiveness of our method. Our code is released at here.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes