CVCLSep 14, 2021

Adaptive Proposal Generation Network for Temporal Sentence Localization in Videos

arXiv:2109.06398v1670 citations
Originality Highly original
AI Analysis

This work addresses the efficiency and accuracy trade-off in video temporal localization, offering a novel method that enhances segment-level interaction while reducing redundancy, though it is incremental in the context of existing frameworks.

The paper tackles temporal sentence localization in videos by proposing an Adaptive Proposal Generation Network (APGN) that adaptively generates segment proposals to replace handcrafted ones, achieving significant performance improvements on three benchmarks.

We address the problem of temporal sentence localization in videos (TSLV). Traditional methods follow a top-down framework which localizes the target segment with pre-defined segment proposals. Although they have achieved decent performance, the proposals are handcrafted and redundant. Recently, bottom-up framework attracts increasing attention due to its superior efficiency. It directly predicts the probabilities for each frame as a boundary. However, the performance of bottom-up model is inferior to the top-down counterpart as it fails to exploit the segment-level interaction. In this paper, we propose an Adaptive Proposal Generation Network (APGN) to maintain the segment-level interaction while speeding up the efficiency. Specifically, we first perform a foreground-background classification upon the video and regress on the foreground frames to adaptively generate proposals. In this way, the handcrafted proposal design is discarded and the redundant proposals are decreased. Then, a proposal consolidation module is further developed to enhance the semantic of the generated proposals. Finally, we locate the target moments with these generated proposals following the top-down framework. Extensive experiments on three challenging benchmarks show that our proposed APGN significantly outperforms previous state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes