Benchmarking the Robustness of Semantic Segmentation Models
This work addresses the need for robust semantic segmentation in applications like autonomous driving, providing an exhaustive benchmark study that is incremental but offers new insights.
The authors tackled the problem of evaluating the robustness of semantic segmentation models to image corruptions, finding that model robustness generally increases with performance and that certain architecture properties significantly affect robustness.
When designing a semantic segmentation module for a practical application, such as autonomous driving, it is crucial to understand the robustness of the module with respect to a wide range of image corruptions. While there are recent robustness studies for full-image classification, we are the first to present an exhaustive study for semantic segmentation, based on the state-of-the-art model DeepLabv3+. To increase the realism of our study, we utilize almost 400,000 images generated from Cityscapes, PASCAL VOC 2012, and ADE20K. Based on the benchmark study, we gain several new insights. Firstly, contrary to full-image classification, model robustness increases with model performance, in most cases. Secondly, some architecture properties affect robustness significantly, such as a Dense Prediction Cell, which was designed to maximize performance on clean data only.