CVAug 10, 2017

Semantic Video CNNs through Representation Warping

Raghudeep Gadde, Varun Jampani, Peter V. Gehler

arXiv:1708.03088v125.7222 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient semantic segmentation in video streams for applications like autonomous driving, though it is incremental as it builds on existing CNN architectures.

The authors tackled the problem of adapting CNN models for semantic segmentation from static images to video data by introducing a warping method called NetWarp, which uses optical flow to warp internal representations across time with minimal extra computational cost, achieving new state-of-the-art results on CamVid and Cityscapes benchmarks.

In this work, we propose a technique to convert CNN models for semantic segmentation of static images into CNNs for video data. We describe a warping method that can be used to augment existing architectures with very little extra computational cost. This module is called NetWarp and we demonstrate its use for a range of network architectures. The main design principle is to use optical flow of adjacent frames for warping internal network representations across time. A key insight of this work is that fast optical flow methods can be combined with many different CNN architectures for improved performance and end-to-end training. Experiments validate that the proposed approach incurs only little extra computational cost, while improving performance, when video streams are available. We achieve new state-of-the-art results on the CamVid and Cityscapes benchmark datasets and show consistent improvements over different baseline networks. Our code and models will be available at http://segmentation.is.tue.mpg.de

View on arXiv PDF

Similar