CVNov 27, 2022

Post-Processing Temporal Action Detection

Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

arXiv:2211.14924v28.110 citationsh-index: 34Has Code

Originality Incremental advance

AI Analysis

This addresses a bottleneck in TAD for video analysis applications, offering a post-processing solution that enhances performance and efficiency, though it is incremental as it builds on existing methods.

The paper tackles the problem of temporal quantization error in Temporal Action Detection (TAD) caused by pre-processing steps that reduce temporal resolution, by introducing a model-agnostic post-processing method called GAP that improves detection performance without model redesign. The result shows consistent improvements on ActivityNet (+0.2% -0.7% average mAP) and THUMOS (+0.2% -0.5% average mAP) benchmarks, comparable to gains from novel model designs.

Existing Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence, before temporal boundary estimation and action classification. This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution. In essence, this is due to a temporal quantization error introduced during the resolution downsampling and recovery. This could negatively impact the TAD performance, but is largely ignored by existing methods. To address this problem, in this work we introduce a novel model-agnostic post-processing method without model redesign and retraining. Specifically, we model the start and end points of action instances with a Gaussian distribution for enabling temporal boundary inference at a sub-snippet level. We further introduce an efficient Taylor-expansion based approximation, dubbed as Gaussian Approximated Post-processing (GAP). Extensive experiments demonstrate that our GAP can consistently improve a wide variety of pre-trained off-the-shelf TAD models on the challenging ActivityNet (+0.2% -0.7% in average mAP) and THUMOS (+0.2% -0.5% in average mAP) benchmarks. Such performance gains are already significant and highly comparable to those achieved by novel model designs. Also, GAP can be integrated with model training for further performance gain. Importantly, GAP enables lower temporal resolutions for more efficient inference, facilitating low-resource applications. The code will be available in https://github.com/sauradip/GAP

View on arXiv PDF Code

Similar