CV LG IVJun 16, 2022

Simple and Efficient Architectures for Semantic Segmentation

Dushyant Mehta, Andrii Skliar, Haitam Ben Yahia, Shubhankar Borse, Fatih Porikli, Amirhossein Habibian, Tijmen Blankevoort

arXiv:2206.08236v13.719 citationsh-index: 71Has Code

Originality Incremental advance

AI Analysis

This provides practitioners with efficient and competitive baselines for semantic segmentation, though it is incremental in simplifying existing approaches.

The paper tackles the problem of complex and inefficient semantic segmentation architectures by proposing a simple encoder-decoder design with modified ResNet backbones to enlarge the receptive field, achieving performance on-par or better than state-of-the-art models like HRNet on the Cityscapes dataset.

Though the state-of-the architectures for semantic segmentation, such as HRNet, demonstrate impressive accuracy, the complexity arising from their salient design choices hinders a range of model acceleration tools, and further they make use of operations that are inefficient on current hardware. This paper demonstrates that a simple encoder-decoder architecture with a ResNet-like backbone and a small multi-scale head, performs on-par or better than complex semantic segmentation architectures such as HRNet, FANet and DDRNets. Naively applying deep backbones designed for Image Classification to the task of Semantic Segmentation leads to sub-par results, owing to a much smaller effective receptive field of these backbones. Implicit among the various design choices put forth in works like HRNet, DDRNet, and FANet are networks with a large effective receptive field. It is natural to ask if a simple encoder-decoder architecture would compare favorably if comprised of backbones that have a larger effective receptive field, though without the use of inefficient operations like dilated convolutions. We show that with minor and inexpensive modifications to ResNets, enlarging the receptive field, very simple and competitive baselines can be created for Semantic Segmentation. We present a family of such simple architectures for desktop as well as mobile targets, which match or exceed the performance of complex models on the Cityscapes dataset. We hope that our work provides simple yet effective baselines for practitioners to develop efficient semantic segmentation models.

View on arXiv PDF Code

Similar