CVMar 2, 2020

Plug & Play Convolutional Regression Tracker for Video Object Detection

arXiv:2003.00981v1
AI Analysis

This work addresses the problem of video object detection for computer vision applications, offering an incremental improvement by enhancing detection consistency without major modifications to existing systems.

The paper tackles the challenge of consistent object detection across video frames by proposing a plug-and-play convolutional regression tracker that can be integrated into existing detection networks, improving mAP by around 5% on the ImageNet VID dataset with minimal speed impact.

Video object detection targets to simultaneously localize the bounding boxes of the objects and identify their classes in a given video. One challenge for video object detection is to consistently detect all objects across the whole video. As the appearance of objects may deteriorate in some frames, features or detections from the other frames are commonly used to enhance the prediction. In this paper, we propose a Plug & Play scale-adaptive convolutional regression tracker for the video object detection task, which could be easily and compatibly implanted into the current state-of-the-art detection networks. As the tracker reuses the features from the detector, it is a very light-weighted increment to the detection network. The whole network performs at the speed close to a standard object detector. With our new video object detection pipeline design, image object detectors can be easily turned into efficient video object detectors without modifying any parameters. The performance is evaluated on the large-scale ImageNet VID dataset. Our Plug & Play design improves mAP score for the image detector by around 5% with only little speed drop.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes