Gated-SCNN: Gated Shape CNNs for Semantic Segmentation
This addresses the problem of inaccurate object boundaries in semantic segmentation for applications like autonomous driving, though it is an incremental improvement over existing methods.
The paper tackled semantic segmentation by proposing a two-stream CNN architecture that explicitly processes shape information separately to improve boundary accuracy, achieving state-of-the-art performance on Cityscapes with a 2% improvement in mIoU and 4% in F-score.
Current state-of-the-art methods for image segmentation form a dense image representation where the color, shape and texture information are all processed together inside a deep CNN. This however may not be ideal as they contain very different type of information relevant for recognition. Here, we propose a new two-stream CNN architecture for semantic segmentation that explicitly wires shape information as a separate processing branch, i.e. shape stream, that processes information in parallel to the classical stream. Key to this architecture is a new type of gates that connect the intermediate layers of the two streams. Specifically, we use the higher-level activations in the classical stream to gate the lower-level activations in the shape stream, effectively removing noise and helping the shape stream to only focus on processing the relevant boundary-related information. This enables us to use a very shallow architecture for the shape stream that operates on the image-level resolution. Our experiments show that this leads to a highly effective architecture that produces sharper predictions around object boundaries and significantly boosts performance on thinner and smaller objects. Our method achieves state-of-the-art performance on the Cityscapes benchmark, in terms of both mask (mIoU) and boundary (F-score) quality, improving by 2% and 4% over strong baselines.