Estimating Uncertainty Online Against an Adversary
This addresses safety and reliability issues in ML systems, particularly for applications like healthcare, by extending online learning to handle uncertainty under adversarial conditions.
The paper tackles the problem of reliable uncertainty estimation for classification algorithms under out-of-distribution and adversarial inputs, proposing techniques with formal guarantees and validating them on question answering and medical diagnosis tasks.
Assessing uncertainty is an important step towards ensuring the safety and reliability of machine learning systems. Existing uncertainty estimation techniques may fail when their modeling assumptions are not met, e.g. when the data distribution differs from the one seen at training time. Here, we propose techniques that assess a classification algorithm's uncertainty via calibrated probabilities (i.e. probabilities that match empirical outcome frequencies in the long run) and which are guaranteed to be reliable (i.e. accurate and calibrated) on out-of-distribution input, including input generated by an adversary. This represents an extension of classical online learning that handles uncertainty in addition to guaranteeing accuracy under adversarial assumptions. We establish formal guarantees for our methods, and we validate them on two real-world problems: question answering and medical diagnosis from genomic data.