ML LG MESep 26, 2025

SADA: Safe and Adaptive Inference with Multiple Black-Box Predictions

arXiv:2509.21707v1h-index: 4

Originality Highly original

AI Analysis

This addresses the challenge of leveraging abundant unlabeled data with multiple predictions for researchers and practitioners in machine learning, offering a safe and adaptive inference framework.

The paper tackles the problem of aggregating multiple black-box predictions with unknown quality in scenarios with scarce labeled data, proposing a method that guarantees safety by never performing worse than using labeled data alone and adaptively exploiting perfect predictions to achieve faster convergence or efficiency bounds.

Real-world applications often face scarce labeled data due to the high cost and time requirements of gold-standard experiments, whereas unlabeled data are typically abundant. With the growing adoption of machine learning techniques, it has become increasingly feasible to generate multiple predicted labels using a variety of models and algorithms, including deep learning, large language models, and generative AI. In this paper, we propose a novel approach that safely and adaptively aggregates multiple black-box predictions with unknown quality while preserving valid statistical inference. Our method provides two key guarantees: (i) it never performs worse than using the labeled data alone, regardless of the quality of the predictions; and (ii) if any one of the predictions (without knowing which one) perfectly fits the ground truth, the algorithm adaptively exploits this to achieve either a faster convergence rate or the semiparametric efficiency bound. We demonstrate the effectiveness of the proposed algorithm through experiments on both synthetic and benchmark datasets.

View on arXiv PDF

Similar