CVAIMar 10, 2021

PatchNet -- Short-range Template Matching for Efficient Video Processing

arXiv:2103.07371v17 citationsHas Code
AI Analysis

This work addresses the need for low-cost, on-device video recognition, offering incremental improvements in efficiency for tasks like video object detection and tracking.

The paper tackles the problem of efficient object recognition in video processing by proposing PatchNet, a compact convolutional neural network that reduces computational cost significantly. It achieves up to 5x reduction in FLOPs with minimal accuracy loss, such as less than 1% mAP loss on ImageNet VID and no accuracy loss on OTB2015.

Object recognition is a fundamental problem in many video processing tasks, accurately locating seen objects at low computation cost paves the way for on-device video recognition. We propose PatchNet, an efficient convolutional neural network to match objects in adjacent video frames. It learns the patchwise correlation features instead of pixel features. PatchNet is very compact, running at just 58MFLOPs, $5\times$ simpler than MobileNetV2. We demonstrate its application on two tasks, video object detection and visual object tracking. On ImageNet VID, PatchNet reduces the flops of R-FCN ResNet-101 by 5x and EfficientDet-D0 by 3.4x with less than 1% mAP loss. On OTB2015, PatchNet reduces SiamFC and SiamRPN by 2.5x with no accuracy loss. Experiments on Jetson Nano further demonstrate 2.8x to 4.3x speed-ups associated with flops reduction. Code is open sourced at https://github.com/RalphMao/PatchNet.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes