LG MLMar 11

Beyond Accuracy: Reliability and Uncertainty Estimation in Convolutional Neural Networks

Sanne Ruijs, Alina Kosiakova, Farrukh Javed

arXiv:2603.10731v16.0h-index: 9

Predicted impact top 84% in LG · last 90 daysOriginality Synthesis-oriented

AI Analysis

This addresses the need for reliable uncertainty estimation in deep learning, particularly for high-stakes applications, but is incremental as it compares existing methods on a standard dataset.

The paper tackled the problem of poor calibration and overconfidence in deep neural networks by comparing Bayesian approximation via Monte Carlo Dropout and Conformal Prediction for uncertainty estimation on the Fashion-MNIST dataset, finding that GoogLeNet yields better-calibrated uncertainty estimates while H-CNN VGG16 has higher accuracy but more overconfidence.

Deep neural networks (DNNs) have become integral to a wide range of scientific and practical applications due to their flexibility and strong predictive performance. Despite their accuracy, however, DNNs frequently exhibit poor calibration, often assigning overly confident probabilities to incorrect predictions. This limitation underscores the growing need for integrated mechanisms that provide reliable uncertainty estimation. In this article, we compare two prominent approaches for uncertainty quantification: a Bayesian approximation via Monte Carlo Dropout and the nonparametric Conformal Prediction framework. Both methods are assessed using two convolutional neural network architectures; H-CNN VGG16 and GoogLeNet, trained on the Fashion-MNIST dataset. The empirical results show that although H-CNN VGG16 attains higher predictive accuracy, it tends to exhibit pronounced overconfidence, whereas GoogLeNet yields better-calibrated uncertainty estimates. Conformal Prediction additionally demonstrates consistent validity by producing statistically guaranteed prediction sets, highlighting its practical value in high-stakes decision-making contexts. Overall, the findings emphasize the importance of evaluating model performance beyond accuracy alone and contribute to the development of more reliable and trustworthy deep learning systems.

View on arXiv PDF

Similar