Evaluating Bayesian Deep Learning Methods for Semantic Segmentation
This work addresses the need for standardized evaluation of uncertainty estimates in semantic segmentation, which is incremental as it builds on existing BDL methods and datasets.
The authors tackled the lack of evaluation metrics for Bayesian Deep Learning (BDL) methods in semantic segmentation by proposing three new metrics, and they applied these to compare MC dropout and Concrete dropout on the Cityscapes dataset, establishing new benchmarks for uncertainty quantification.
Deep learning has been revolutionary for computer vision and semantic segmentation in particular, with Bayesian Deep Learning (BDL) used to obtain uncertainty maps from deep models when predicting semantic classes. This information is critical when using semantic segmentation for autonomous driving for example. Standard semantic segmentation systems have well-established evaluation metrics. However, with BDL's rising popularity in computer vision we require new metrics to evaluate whether a BDL method produces better uncertainty estimates than another method. In this work we propose three such metrics to evaluate BDL models designed specifically for the task of semantic segmentation. We modify DeepLab-v3+, one of the state-of-the-art deep neural networks, and create its Bayesian counterpart using MC dropout and Concrete dropout as inference techniques. We then compare and test these two inference techniques on the well-known Cityscapes dataset using our suggested metrics. Our results provide new benchmarks for researchers to compare and evaluate their improved uncertainty quantification in pursuit of safer semantic segmentation.