On the Bayes Inconsistency of Disagreement Discrepancy Surrogates
This addresses a fundamental flaw in methods for improving model robustness under distribution shift, which is critical for building safe and reliable AI systems, though it is incremental as it builds on existing theoretical frameworks.
The paper tackles the problem of distribution shift in deep neural networks by analyzing disagreement discrepancy surrogates, proving that existing ones are not Bayes consistent and proposing a novel loss that yields a provably consistent surrogate, with empirical evaluations showing more accurate and robust estimates under adversarial conditions.
Deep neural networks often fail when deployed in real-world contexts due to distribution shift, a critical barrier to building safe and reliable systems. An emerging approach to address this problem relies on \emph{disagreement discrepancy} -- a measure of how the disagreement between two models changes under a shifting distribution. The process of maximizing this measure has seen applications in bounding error under shifts, testing for harmful shifts, and training more robust models. However, this optimization involves the non-differentiable zero-one loss, necessitating the use of practical surrogate losses. We prove that existing surrogates for disagreement discrepancy are not Bayes consistent, revealing a fundamental flaw: maximizing these surrogates can fail to maximize the true disagreement discrepancy. To address this, we introduce new theoretical results providing both upper and lower bounds on the optimality gap for such surrogates. Guided by this theory, we propose a novel disagreement loss that, when paired with cross-entropy, yields a provably consistent surrogate for disagreement discrepancy. Empirical evaluations across diverse benchmarks demonstrate that our method provides more accurate and robust estimates of disagreement discrepancy than existing approaches, particularly under challenging adversarial conditions.