CVJun 21, 2021

Temporal Early Exits for Efficient Video Object Detection

Amin Sabet, Jonathon Hare, Bashir Al-Hashimi, Geoff V. Merrett

arXiv:2106.11208v13.72 citations

Originality Incremental advance

AI Analysis

This work addresses efficiency problems for video object detection in resource-constrained applications like surveillance, though it is incremental as it builds on existing early exit and feature propagation techniques.

The paper tackles the challenge of reducing computational complexity in per-frame video object detection by introducing temporal early exits, which identify semantic changes between consecutive frames to avoid full computation when unnecessary. The method achieves up to a 34x reduction in computational complexity with only a 2.2% drop in mAP on the CDnet dataset.

Transferring image-based object detectors to the domain of video remains challenging under resource constraints. Previous efforts utilised optical flow to allow unchanged features to be propagated, however, the overhead is considerable when working with very slowly changing scenes from applications such as surveillance. In this paper, we propose temporal early exits to reduce the computational complexity of per-frame video object detection. Multiple temporal early exit modules with low computational overhead are inserted at early layers of the backbone network to identify the semantic differences between consecutive frames. Full computation is only required if the frame is identified as having a semantic change to previous frames; otherwise, detection results from previous frames are reused. Experiments on CDnet show that our method significantly reduces the computational complexity and execution of per-frame video object detection up to $34 \times$ compared to existing methods with an acceptable reduction of 2.2\% in mAP.

View on arXiv PDF

Similar