LGOct 31, 2025

ODP-Bench: Benchmarking Out-of-Distribution Performance Prediction

arXiv:2510.27263v13 citationsh-index: 15
Originality Synthesis-oriented
AI Analysis

This work addresses the need for fair and convenient benchmarking in OOD performance prediction for researchers deploying models in risk-sensitive scenarios, but it is incremental as it builds on existing datasets and algorithms.

The paper tackles the problem of inconsistent evaluation protocols and limited coverage in out-of-distribution (OOD) performance prediction by proposing ODP-Bench, a comprehensive benchmark that includes commonly used OOD datasets and existing algorithms, providing trained models for consistent comparisons and experimental analyses.

Recently, there has been gradually more attention paid to Out-of-Distribution (OOD) performance prediction, whose goal is to predict the performance of trained models on unlabeled OOD test datasets, so that we could better leverage and deploy off-the-shelf trained models in risk-sensitive scenarios. Although progress has been made in this area, evaluation protocols in previous literature are inconsistent, and most works cover only a limited number of real-world OOD datasets and types of distribution shifts. To provide convenient and fair comparisons for various algorithms, we propose Out-of-Distribution Performance Prediction Benchmark (ODP-Bench), a comprehensive benchmark that includes most commonly used OOD datasets and existing practical performance prediction algorithms. We provide our trained models as a testbench for future researchers, thus guaranteeing the consistency of comparison and avoiding the burden of repeating the model training process. Furthermore, we also conduct in-depth experimental analyses to better understand their capability boundary.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes