CVApr 16, 2018

Towards High Performance Video Object Detection for Mobiles

arXiv:1804.05830v150 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient video object detection for mobile applications, representing an incremental improvement over existing methods.

The paper tackles the problem of video object detection on mobile devices by proposing a lightweight network architecture that uses sparse key frames, a small flow network, and flow-guided GRU for feature aggregation, achieving 60.2% mAP at 25.6 fps on a mobile device.

Despite the recent success of video object detection on Desktop GPUs, its architecture is still far too heavy for mobiles. It is also unclear whether the key principles of sparse feature propagation and multi-frame feature aggregation apply at very limited computational resources. In this paper, we present a light weight network architecture for video object detection on mobiles. Light weight image object detector is applied on sparse key frames. A very small network, Light Flow, is designed for establishing correspondence across frames. A flow-guided GRU module is designed to effectively aggregate features on key frames. For non-key frames, sparse feature propagation is performed. The whole network can be trained end-to-end. The proposed system achieves 60.2% mAP score at speed of 25.6 fps on mobiles (e.g., HuaWei Mate 8).

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes