MLLGMESep 26, 2025

SADA: Safe and Adaptive Inference with Multiple Black-Box Predictions

arXiv:2509.21707v1h-index: 4
Originality Highly original
AI Analysis

This addresses the challenge of leveraging abundant unlabeled data with multiple predictions for researchers and practitioners in machine learning, offering a safe and adaptive inference framework.

The paper tackles the problem of aggregating multiple black-box predictions with unknown quality in scenarios with scarce labeled data, proposing a method that guarantees safety by never performing worse than using labeled data alone and adaptively exploiting perfect predictions to achieve faster convergence or efficiency bounds.

Real-world applications often face scarce labeled data due to the high cost and time requirements of gold-standard experiments, whereas unlabeled data are typically abundant. With the growing adoption of machine learning techniques, it has become increasingly feasible to generate multiple predicted labels using a variety of models and algorithms, including deep learning, large language models, and generative AI. In this paper, we propose a novel approach that safely and adaptively aggregates multiple black-box predictions with unknown quality while preserving valid statistical inference. Our method provides two key guarantees: (i) it never performs worse than using the labeled data alone, regardless of the quality of the predictions; and (ii) if any one of the predictions (without knowing which one) perfectly fits the ground truth, the algorithm adaptively exploits this to achieve either a faster convergence rate or the semiparametric efficiency bound. We demonstrate the effectiveness of the proposed algorithm through experiments on both synthetic and benchmark datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes