ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time
This addresses the need for efficient semantic segmentation in real-time applications like autonomous driving, though it is incremental as it builds on existing techniques like factorized convolution.
The paper tackles the problem of achieving real-time semantic segmentation with low computational cost while maintaining accuracy, proposing ContextNet which achieves 66.1% accuracy at 18.3 fps on the Cityscapes dataset.
Modern deep learning architectures produce highly accurate results on many challenging semantic segmentation datasets. State-of-the-art methods are, however, not directly transferable to real-time applications or embedded devices, since naive adaptation of such systems to reduce computational cost (speed, memory and energy) causes a significant drop in accuracy. We propose ContextNet, a new deep neural network architecture which builds on factorized convolution, network compression and pyramid representation to produce competitive semantic segmentation in real-time with low memory requirement. ContextNet combines a deep network branch at low resolution that captures global context information efficiently with a shallow branch that focuses on high-resolution segmentation details. We analyse our network in a thorough ablation study and present results on the Cityscapes dataset, achieving 66.1% accuracy at 18.3 frames per second at full (1024x2048) resolution (41.9 fps with pipelined computations for streamed data).