CVJun 21, 2021

Temporal Early Exits for Efficient Video Object Detection

arXiv:2106.11208v12 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency problems for video object detection in resource-constrained applications like surveillance, though it is incremental as it builds on existing early exit and feature propagation techniques.

The paper tackles the challenge of reducing computational complexity in per-frame video object detection by introducing temporal early exits, which identify semantic changes between consecutive frames to avoid full computation when unnecessary. The method achieves up to a 34x reduction in computational complexity with only a 2.2% drop in mAP on the CDnet dataset.

Transferring image-based object detectors to the domain of video remains challenging under resource constraints. Previous efforts utilised optical flow to allow unchanged features to be propagated, however, the overhead is considerable when working with very slowly changing scenes from applications such as surveillance. In this paper, we propose temporal early exits to reduce the computational complexity of per-frame video object detection. Multiple temporal early exit modules with low computational overhead are inserted at early layers of the backbone network to identify the semantic differences between consecutive frames. Full computation is only required if the frame is identified as having a semantic change to previous frames; otherwise, detection results from previous frames are reused. Experiments on CDnet show that our method significantly reduces the computational complexity and execution of per-frame video object detection up to $34 \times$ compared to existing methods with an acceptable reduction of 2.2\% in mAP.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes