CVApr 3, 2020

TimeGate: Conditional Gating of Segments in Long-range Activities

arXiv:2004.01808v117 citations
AI Analysis

This addresses computational inefficiency in video analysis for applications like surveillance or video indexing, though it is incremental as it builds on existing sampling methods.

The paper tackles the problem of efficiently recognizing long-range activities in videos by sampling only salient segments, proposing TimeGate with a conditional gating module. It reduces computation by 50% while maintaining accuracy on benchmarks like Charades, Breakfast, and MultiThumos.

When recognizing a long-range activity, exploring the entire video is exhaustive and computationally expensive, as it can span up to a few minutes. Thus, it is of great importance to sample only the salient parts of the video. We propose TimeGate, along with a novel conditional gating module, for sampling the most representative segments from the long-range activity. TimeGate has two novelties that address the shortcomings of previous sampling methods, as SCSampler. First, it enables a differentiable sampling of segments. Thus, TimeGate can be fitted with modern CNNs and trained end-to-end as a single and unified model.Second, the sampling is conditioned on both the segments and their context. Consequently, TimeGate is better suited for long-range activities, where the importance of a segment heavily depends on the video context.TimeGate reduces the computation of existing CNNs on three benchmarks for long-range activities: Charades, Breakfast and MultiThumos. In particular, TimeGate reduces the computation of I3D by 50% while maintaining the classification accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes