CVAug 9, 2023

Objects do not disappear: Video object detection by single-frame object location anticipation

arXiv:2308.04770v111 citationsh-index: 39Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient and accurate video object detection for computer vision applications, offering incremental improvements in method and efficiency.

The paper tackles video object detection by leveraging continuous smooth motion to anticipate object locations from a static keyframe, resulting in improved mean average precision, computational efficiency, and reduced annotation costs on four datasets.

Objects in videos are typically characterized by continuous smooth motion. We exploit continuous smooth motion in three ways. 1) Improved accuracy by using object motion as an additional source of supervision, which we obtain by anticipating object locations from a static keyframe. 2) Improved efficiency by only doing the expensive feature computations on a small subset of all frames. Because neighboring video frames are often redundant, we only compute features for a single static keyframe and predict object locations in subsequent frames. 3) Reduced annotation cost, where we only annotate the keyframe and use smooth pseudo-motion between keyframes. We demonstrate computational efficiency, annotation efficiency, and improved mean average precision compared to the state-of-the-art on four datasets: ImageNet VID, EPIC KITCHENS-55, YouTube-BoundingBoxes, and Waymo Open dataset. Our source code is available at https://github.com/L-KID/Videoobject-detection-by-location-anticipation.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes