Exploring the Limits of Epistemic Uncertainty Quantification in Low-Shot Settings
This work provides empirical guidance for practitioners selecting uncertainty methods based on available data, though it is incremental as it evaluates existing methods on new data conditions.
The paper evaluated seven uncertainty quantification methods on Fashion MNIST and CIFAR10 with varying training set sizes, finding that calibration error and out-of-distribution detection performance strongly depend on training set size, with most methods being miscalibrated with small training sets and gradient-based methods performing poorly.
Uncertainty quantification in neural network promises to increase safety of AI systems, but it is not clear how performance might vary with the training set size. In this paper we evaluate seven uncertainty methods on Fashion MNIST and CIFAR10, as we sub-sample and produce varied training set sizes. We find that calibration error and out of distribution detection performance strongly depend on the training set size, with most methods being miscalibrated on the test set with small training sets. Gradient-based methods seem to poorly estimate epistemic uncertainty and are the most affected by training set size. We expect our results can guide future research into uncertainty quantification and help practitioners select methods based on their particular available data.