Learning U-Statistics with Active Inference
For practitioners using U-statistics with costly label acquisition, this work provides a principled method to reduce labeling costs while preserving statistical validity.
The paper develops an active inference framework for U-statistics that selectively queries informative labels to improve estimation efficiency under a fixed labeling budget, achieving substantial gains over baseline methods on real datasets while maintaining target coverage.
$U$-statistics play a central role in statistical inference. In many modern applications, however, acquiring the labels required for $U$-statistics is costly. Motivated by recent advances in active inference, we develop an active inference framework for $U$-statistics that selectively queries informative labels to improve estimation efficiency under a fixed labeling budget, while preserving valid statistical inference. Our approach is built on the augmented inverse probability weighting $U$-statistic, which is designed to incorporate the sampling rule and machine learning predictions. We characterize the optimal sampling rule that minimizes its variance and design practical sampling strategies. We further extend the framework to $U$-statistic-based empirical risk minimization. Experiments on real datasets demonstrate substantial gains in estimation efficiency over baseline methods, while maintaining target coverage.