CVAug 20, 2021

BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies

arXiv:2108.09376v219 citations
AI Analysis

This work addresses the problem of high computational cost in video processing for applications like pedestrian detection and segmentation, offering a universal framework that is incremental but practical.

The paper tackles the inefficiency of processing video frame-by-frame with CNNs by introducing BlockCopy, which uses a lightweight policy network to select important regions for processing and copies features from previous frames for others, achieving significant FLOPS savings and inference speedup with minimal accuracy loss.

In this paper we propose BlockCopy, a scheme that accelerates pretrained frame-based CNNs to process video more efficiently, compared to standard frame-by-frame processing. To this end, a lightweight policy network determines important regions in an image, and operations are applied on selected regions only, using custom block-sparse convolutions. Features of non-selected regions are simply copied from the preceding frame, reducing the number of computations and latency. The execution policy is trained using reinforcement learning in an online fashion without requiring ground truth annotations. Our universal framework is demonstrated on dense prediction tasks such as pedestrian detection, instance segmentation and semantic segmentation, using both state of the art (Center and Scale Predictor, MGAN, SwiftNet) and standard baseline networks (Mask-RCNN, DeepLabV3+). BlockCopy achieves significant FLOPS savings and inference speedup with minimal impact on accuracy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes