CVAIMMOct 20, 2024

ContextDet: Temporal Action Detection with Adaptive Context Aggregation

arXiv:2410.15279v16 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work improves video understanding for applications like surveillance and content analysis, but it is incremental as it builds on existing single-stage methods with novel modules.

The paper tackles the problem of temporal action detection in videos by addressing imprecise boundary predictions due to indiscriminate treatment of neighboring contexts, introducing the ContextDet framework with adaptive context aggregation to achieve superior accuracy on six benchmarks.

Temporal action detection (TAD), which locates and recognizes action segments, remains a challenging task in video understanding due to variable segment lengths and ambiguous boundaries. Existing methods treat neighboring contexts of an action segment indiscriminately, leading to imprecise boundary predictions. We introduce a single-stage ContextDet framework, which makes use of large-kernel convolutions in TAD for the first time. Our model features a pyramid adaptive context aggragation (ACA) architecture, capturing long context and improving action discriminability. Each ACA level consists of two novel modules. The context attention module (CAM) identifies salient contextual information, encourages context diversity, and preserves context integrity through a context gating block (CGB). The long context module (LCM) makes use of a mixture of large- and small-kernel convolutions to adaptively gather long-range context and fine-grained local features. Additionally, by varying the length of these large kernels across the ACA pyramid, our model provides lightweight yet effective context aggregation and action discrimination. We conducted extensive experiments and compared our model with a number of advanced TAD methods on six challenging TAD benchmarks: MultiThumos, Charades, FineAction, EPIC-Kitchens 100, Thumos14, and HACS, demonstrating superior accuracy at reduced inference speed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes