Beyond Top-Class Agreement: Using Divergences to Forecast Performance under Distribution Shift
This addresses the challenge of safe model deployment by improving generalization assessment for practitioners in machine learning, though it is incremental as it builds on existing disagreement concepts.
The paper tackled the problem of forecasting model performance under distribution shift by studying disagreement notions based on full predictive distributions, finding that divergence-based scores provide better test error estimates and detection rates on out-of-distribution data compared to top-1 agreement methods.
Knowing if a model will generalize to data 'in the wild' is crucial for safe deployment. To this end, we study model disagreement notions that consider the full predictive distribution - specifically disagreement based on Hellinger distance, Jensen-Shannon and Kullback-Leibler divergence. We find that divergence-based scores provide better test error estimates and detection rates on out-of-distribution data compared to their top-1 counterparts. Experiments involve standard vision and foundation models.