CVDec 26, 2019

Efficient Video Semantic Segmentation with Labels Propagation and Refinement

arXiv:1912.11844v140 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient video semantic segmentation in applications like autonomous driving, though it is incremental as it builds on existing real-time methods.

The paper tackles real-time semantic segmentation of high-definition videos by proposing a hybrid GPU/CPU pipeline that achieves competitive accuracy (mIoU above 60%) while enabling frame rates from 80 to 1000 Hz on the Cityscapes dataset.

This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach. We propose an Efficient Video Segmentation(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next. It runs in parallel with the GPU. (ii) On the GPU, two Convolutional Neural Networks: A main segmentation network that is used to predict dense semantic labels from scratch, and a Refiner that is designed to improve predictions from previous frames with the help of a fast Inconsistencies Attention Module (IAM). The latter can identify regions that cannot be propagated accurately. We suggest several operating points depending on the desired frame rate and accuracy. Our pipeline achieves accuracy levels competitive to the existing real-time methods for semantic image segmentation(mIoU above 60%), while achieving much higher frame rates. On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes